The Landing area is a copy of the data structures in your source systems. I don’t think it is a good term to describe data architecture but it is now widely used (unfortunately). It goes hand in hand with the term data swamp. The term data lake has a lot of negative connotations. Some people refer to this part of the data architecture as the data lake. We store the raw and unmodified data from the data sources. You can refer to them as the Raw layer in data architecture. Let’s first zoom in on the Landing, and Persisted Staging layers as these are closely related. We will look at these in more detail in a minute. This has changed significantly over the last few years and we need to cater for a variety of different data source types. 10-15 years ago most data sources for data analytics were relational databases and text files (usually exports from relational databases). They represent and store data that is generated through the business processes. The data sources are the operational systems of an organisation. Let’s first have a look at the architecture diagram.Īs you can see there are various vertical and horizontal layers. Finally we use the principles, apply them to the logical architecture and map it to the physical architecture where we talk about tools and vendors. In a second step we then outline some of the Sonra core principles that we use when evaluating a tool or vendor. You won’t see any reference to tools or vendors. If you are interested in our data architecture and data advisory services please contact us.įirst I will walk you through our logical architecture. It is bullet proof for the vast majority of data management use cases. First things firstĪt Sonra we have developed a reference enterprise data architecture over the years. It does not cover in detail various data integration scenarios and patterns such as Enterprise Application Integration. I should also say that the focus of this reference data architecture is for data analytics and data management use cases. My focus is on enterprise data architecture. While important we do not cover these areas as part of this blog post. There are other aspects to implementing a successful data analytics solution: the operating model, the project management approach, governance, processes and standards, skills etc. while Hadoop is a good fit for processing unstructured data in batch (it was built for this purpose) it is a poor fit for BI style queries (even though it can be shoehorned to do so). Having said that some tools are unquestionably a better fit for certain use cases than others, e.g. There is no hard connection between the two. I see this mistake being made frequently where people hear data warehouse and instinctively like a Pavlovian dog think relational database. You can implement a data warehouse on Hadoop but it does not make sense to say that a technology will replace a concept. The data warehouse is a concept whereas Hadoop is a technology. Separating the conceptual architecture from the implementation details will prevent you from making silly statements such as “Hadoop will replace the data warehouse”. What tools you select is highly dependent on your requirements, your organisation, preferences (build vs buy), budget, skills and much more. Often you will need multiple tools from multiple vendors to achieve your objectives. Some vendors and technologies will be a better fit for what you are trying to achieve than others. buy), budget, existing vendor relations, software license model, skills and much more. The types of tools you select are highly dependent on your requirements, your organisation, the skills you have, preferences (e.g. When we map the logical architecture to tools and vendors we need to take the context and requirements of each organisation into account. It brings the logical architecture to life. I call this secondary level of architecture the physical data architecture. How and where you implement the architecture is important but a secondary concern. A good architecture should be applicable to the cloud and on-premise. You can translate the logical architecture into any combination of tools and technologies. It is purely conceptual and not tied to a particular tool, approach (e.g. This is what I call logical data architecture. Reference data architecture on Snowflake (AWS) V2įirst and foremost data architecture is abstracted from tools and technologies.Reference data architecture on Snowflake (AWS) V1.Reference data architecture for data management and analytics.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |