Based on the Agile Information Factory, we have developed a Reference Architecture which serves as a foundation to effectively and efficiently turn data into gold. We have a layered architecture which is geared for flexibility, scalability, cost efficiency and performance. By having a clear purpose for each layer and by defining clear rules on which transformations happen where, we have a predictable architecture which is ready to take your organization to the next level of data maturity.
This layer consists of the tools that are used for analyzing and reporting on information (think MS Power BI, SSRS). Typical usage includes strategic scorecards, operational dashboards, pixel-perfect reporting, ad-hoc queries, multidimensional analyses and the different flavors of analytics.
In the presentation layer the data is (virtually) transformed to a data model which is based on the information needs of the organisation and is geared towards reporting. These are typically star schemas. The presentation layer can always be rebuilt from scratch based on the data from the foundation layer.
The foundation layer is where we integrate the different data sources. The foundation layer is modeled according to the Data Vault 2.0 methodology, and is also the layer which can be (mostly) automatically generated by the foundation accelerator. In the foundation layer we will store a full history of the data on an atomic level.
The operational layer contains detailed information on operational transactions and is typically loaded in a (near) real-time fashion. The goal is to support operational information needs. The data retention in the operational layer is typically limited in time. Historical changes are not maintained.
The data lake can contain structured, semi-structured and non-structured data. There is typically no schema on write, but a schema on read. Often used in support of data science.
This layer consists of the all the operational data sources in an organization. While this layer used to be typically OLTP systems, there’s quite some innovation in this area (cloud databases, social media, internet of things, service bus, …).
The data lab supports complex ad-hoc queries against a heterogeneous set of data sources. It functions as a sandbox to support data discovery, ad-hoc reporting and analytics. The data is only stored for a limited time. Once a specific analysis is required on a regular basis, it’s typically industrialized in the data warehouse.
Metadata mainly consists of data lineage and impact analysis. Data Lineage: trace where data in the reporting and analytics is coming from and which operations were performed on it. Impact Analysis: Analyse the impact on the Data Warehouse environment when changing source systems. Finally, a well governed business glossary will help to align on business term definitions.
The different integration components which embed and integrate the data architecture within the existing IT landscape. This typically includes a portal (like (Azure) Sharepoint) to share information and reports within the organization, an authentication and authorization provider (like (Azure) Active Directory) or a Graphical Information System which allows to visualize information on a geographical map.