Agile Information Factory
The Agile Information Factory is Sparkle’s framework which positions the latest information management concepts. It is a guide to define how new evolutions in information management can help your organization to induce innovation, add value, reduce cost and increase agility. Not all concepts will be required in each and every organization, but it’s important to at least consider whether it would yes or no add value.
Data Warehouse Automation
Sparkle focuses on automation to maximize value creation and realize efficiency gains. By using automation, customers can realize efficiency gains of up to 400% and focus on value creation rather than data plumbing.
Data Vault 2.0
Data Vault solves many of the issues in classic Data Warehouse projects. One of the basics of the Data Vault data modeling methodology involves splitting business keys, descriptive information and relationships into specific artefacts (hubs, satellites and links). This makes the Data Vault model very predictable, repeatable and lends itself for automation. It’s extremely flexible (integrating a new data source will have zero impact on the existing data warehouse model), helps organisations to comply with legal regulations (full traceability) and is very scalable (with the introduction of hash keys in Data Vault 2.0 scaling each table can now be loaded in parallel).
Thanks to the ever increasing performance of hardware, persisting every layer of a data warehouse architecture is no longer required. By virtualizing certain parts of the architecture, the overall complexity and cost of the solution is decreased. This also increases the agility of the solution, since less development is required. The Presentation Layer and the Business Data Vault are typically good candidates for virtualization. Furthermore, the use of Polybase allows to combine relational with non-relational data.
The goal of a sandbox is to reduce shadow IT (business users that built error prone but valuable MS Excel/MS Access databases) and to support “one of” data discovery/data mining exercises. This data can be combined from the data warehouse, a data lake and operational data. Once a specific analysis is required on a regular basis, it’s typically industrialized in the data warehouse.
Cloud offers many advantages and Microsoft has currently the most complete and integrated data platform offering in the cloud (Cortana Intelligence Suite). Cloud enables cost reduction (easier maintenance, elastic pricing) and offers flexilibility that’s not possible on premise (for example a temporary scale up to execute high volume performance tests). Finally, it also introduces scalability in both directions (easy to scale up/down based on varying demands triggered by for example commercial success or peak loads).
Big Data technologies can complement a traditional data warehouse architecture and offer for example cost benefits to store data for which it’s not yet known whether it will ever be useful for later analysis (for example sensor data). Special care needs to be taken when using big data technologies and it’s important to use it for what it’s good at: mass ingestion. Updates or joins on the other hand need to be avoided (hash keys is already part of the solution here).