Building a Data Centric Architecture – Introduction

Building a Data Centric Architecture – Introduction

Chris EvansCloud, Data Mobility, Data-Centric Architecture

Almost every infrastructure vendor talks about the hybrid multi-cloud as a product to download and install.  Instead, the idea of consuming cloud services from multiple sources into a joined-up and holistic solution is much more complex than we think.  In this long series of posts, we will investigate the validity of the hybrid and multi-cloud models, in particular looking at how data forms the centre of the future hybrid universe.

As we highlighted in this blog post, a hybrid is defined as something created from more than one species.  In motoring, hybrid vehicles use more than one power train, typically petrol/diesel and electric.  In computing terms, we’ve come to think of hybrid infrastructure as comprising on-premises technology and the public cloud working together.

Hybrid vs Multi-cloud

Hybrid and multi-cloud are not the same concepts.  As highlighted in this recent blog post, hybrid is more aligned to using two interwoven solutions, whereas multi-cloud could simply mean exploiting multiple disparate services. 

Many enterprises and smaller businesses have been working in a multi-cloud model for years, using tools like SalesForce, Office 365 and Google Workspace.  There are also many other solutions that focus on specific business processes, such as productivity, service management and internal and customer-facing workflows

As anyone who has worked on SaaS application migration will tell you, moving (for example) from highly customised SharePoint to Google Sites is a major challenge.  The subtleties of migration (from maintaining a consistent look and feel) to continued compliance (retaining audit trails and archives) are difficult to achieve.  In many cases, a compromise is made to either discard some information or to run multiple systems in parallel.

Figure 1 – Computing Domains

Multiple Models

Look at figure 1, and we can see the scale of the problem.  Modern applications fall into one of four categories:

  • IaaS – Infrastructure as a service – LEGO building blocks of storage, compute and networking, all of which can be combined to build more complex services.  The cloud providers do this themselves with PaaS and some SaaS offerings (like managed databases). 
  • SaaS – a range of Software-as-a-Service solutions that provide standard business process functionality and some internal infrastructure services.  Note that there’s also a mix of offerings between IaaS and SaaS that could be classed as PaaS, but this term seems to have become less visible in recent years.
  • On-premises – self-managed infrastructure built in customer-owned and managed data centres. 
  • Edge – smaller units of computing outside of core data centres (either IaaS or on-premises) that offer local computing services, including data creation.  Edge solutions can vary significantly from networks of security cameras connected via 5G to small clusters of servers in branch stores.

The interesting aspect of the four types of computing we think of today is that they have all existed in some form or other over the last 60 years.  In the early days of computing, most businesses couldn’t afford a dedicated mainframe, so they used bureau services like EDS.  Edge computing was implemented with mini-computers like IBM System/38 (becoming AS/400 and iSeries).  Infrastructure-as-a-Service companies such as Sungard AS provided on-demand disaster recovery platforms that avoided the need to build and operate expensive secondary data centres. 

So, the four consumable service models have existed forever – they just may not have been that obvious to everyone. 

Service Characteristics

As our diagram shows, each of these four service models has operating restrictions and controls.  IaaS and SaaS solutions are wholly designed and built by the vendor.  Customers have little or no input into service design, data structures or security models (depending on the layer of implementation).  The vendor determines the speed of innovation, which can be a blessing or a curse.  Amazon Web Services, for example, implements hundreds of new features each year, which widens their appeal but can be difficult to track. 

On-premises and Edge offer the business full flexibility (within the range of products in the market) but place much more emphasis on the design, build and operate processes.  This is one reason for the rise in converged and hyper-converged solutions in the early 2010s that looked to simplify the decision-making process. 

Choice

The IT industry offers businesses a huge amount of choice when deciding how to build IT services.  Start-ups don’t need to run on-premises infrastructure and can be wholly in the public cloud using IaaS and SaaS.  Long-standing organisations tend to have a mix of everything due to the longevity of their business.  Each service type sits on a spectrum of characteristics that provide varying levels of flexibility, control, consumption and management. 

The use of each service type ebbs and flows over time.  On-premises vendors like to talk about repatriation stories where data and applications are brought back onsite.  In reality, the public cloud continues to grow substantially year-on-year, while traditional infrastructure vendors are looking to pivot to opex consumption models that mimic the cloud. 

The Data Challenge

Many of the solutions we’ve described so far tend to operate as small, isolated islands of computing and data.  This happens for several reasons.

  • Lack of data mobility.  With SaaS solutions as an example, the internal data model offered by the vendor will make it hard to lift and shift content to another platform unless some kind of ingest feature is offered.  Data has inertia, which is much greater with structured compared to unstructured content.  IaaS offerings are architected to charge customers moving data out rather than into their platforms. 
  • No opportunistic value.  Moving computing services from one provider to another needs to offer either significantly reduced cost/complexity or additional competitive advantage to the business.  The cost (and risk) of transition needs to be justified.  Today, moving from one service provider to another might be an annual or 5-year plan.  In the future, we can envisage changing solutions on a much shorter timescale.  Either way, the benefits must outweigh the disadvantages.
  • Technical debt and dependencies.  Systems get built around specific technologies and platforms that introduce either minor or major lock-in.  With any technology, there is always an inevitable degree of tie-in.  This is an acceptable compromise that can’t be avoided but can be mitigated. 

One of the reasons long-lived businesses tend to have so many IT solutions is the problem of addressing the above issues when we look at data.  The value of IT to businesses is to deliver better service to customers, which is primarily driven by data.  Infrastructure changes over time.  In the past 30 years, we’ve seen applications deployed on mainframe, client/server, departmental servers, virtual infrastructure, converged infrastructure, hyper-converged infrastructure, containers and now serverless.  All of these technologies have at their heart the eternal concept of data processing. 

Infrastructure evolves and changes, but the core of IT is the data and the value it provides to the business. 

Data Centricity

Why have we continued to build solutions that put infrastructure as the centre of strategy and design?  We can highlight multiple decision points that have caused this behaviour.

  • Cost model – infrastructure has been sold as a capital expense that has to be depreciated over time.  Cloud changes that way of thinking, with hourly operational expenditure at a much granular level.  This offers the opportunity to try new things, experiment and learn, without committing capital to a project. 
  • Complexity – IT has always been difficult to implement at scale.  With many moving parts, once an infrastructure solution is stable, there’s not much desire to change it.  Change introduces risk.  As we adopt solutions that further abstract the underlying hardware, infrastructure can become a service, not an integral part of the design. 
  • Focus on Components – technology companies have focused on selling components – networking, storage, servers and only within the last ten or more years have we seen the packaging of solutions that started with Converged Infrastructure.  Before that, businesses relied on systems integrators to apply their knowledge of component integration. 

As the overhead of building and maintaining infrastructure become de-emphasised, we can focus on the value of data and applications.  Figure 2 shows a simplified model that inverts the traditional deployment model.  Data is at the heart of a virtual cloud that simply uses resources provided by any four of the service domains. 

Figure 2 – Data Centric

Transformation

As businesses transform their IT for the next decade, data will be at the centre of the cloud model.  Cloud and infrastructure vendors will offer services and products to consume on a service basis.  IT organisations need to adapt their technology architectures to be more data-centric.   This design places the most valuable component at the centre of computing, allowing other services to be consumed on-demand. 

The Architect’s View

Over the coming months, we will be digging into the challenges of becoming more data-centric.  This journey will be evidence-based, reviewing vendor products and solutions.  This includes practical tips for measuring and evaluating key IT architectural pillars that put data first.  We will look at data protection, security, infrastructure management, cost management, data mobility and data management and more. In the meantime, here are some older posts that discuss many of the issues raised in this blog. 



Copyright (c) 2007-2021 – Post #17cd – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.