EDW Reporting vastly in increased demand , while resources are still constrained
White paper required with:
Problem statement, Solution and Benefits
– How implementing Persistence staging in EDW justifies the problem statement (see title above)
Decision that “All Reporting” would be through EDW vastly increased demand, while resources are still constrained.
· Data Discovery, required to transform data from the roughly 3rd Normal from source transactional systems to the through Staging with Business Key Augmentation through the Data Vault to the Multi-dimensional modeling of a Data Mart, can be very time consuming
· Hand Modeling of Stage & Data Vault can be time consuming. Using Mapping to create entities can improve the process, but the data type conversion and constraint removal is manual and open to errors of incompletion.
· ETL development for augmented Stage is lengthy, and the data vault involves a large number of tables
· By experience only 30% of the source tables participate in the Data Marts – meaning a lot of the above time doesn’t impact the user community
· Targeted development of only necessary tables has been resisted as it would result in a “ragged edge” of different levels of history for tables even within a single source
· Historical Mart requirements have been heavily oriented to operational report specific needs
· Dimensional Modeling requires knowledge of how the users view and interact with the data, information that has been very difficult to come by
Solution Concept – Persistent Stage with Virtual Raw Data Mart
· Persistent Stage involves stacking daily Net Change deltas until need, including integration with other systems, justified bring through the Data Vault to a Data Mart
o Persistent Stage mirrors source format
· Does not requirement augmentation
· Does not requirement deep dive data discovery beyond table classification with regards to being in scope as business data
· Simplified design lends itself to automation of design (data model and mapping) and development
o All tables means no jagged edge to history collection
o By stacking all changes the view of the data at any given point in time, including any business key data augmentation as determined by later data discovery is available
o Augmentation in place at a future time, including the impact of secondary table deltas, is possible.
o Does not eliminated work necessary bring data through to the data marts, it postpones it until needs are identified and limits it to data necessary to meet the needs.
· Raw Data Marts involves making the data available to select user communities in a format similar to Legacy Reporting ODS environments
o No additional discovery required
o Structures mirror source and stage
o No issues on business rules to apply
o Data is not integrated with other sources
o Any logic necessary to be applied to turn data to information has to be duplicated with every access.
· Virtual Raw Data Marts
o Presents a Current Record source view of the stacked deltas
o Eliminates ETL between stage and Raw Data Mart
o Minimizes Raw Data Mart testing
o Minimizes Raw Data Mart risk – views can always be recreated without loss of data
o Performance on views up data volumes including several times the volume of the largest tables from targeted system (Dairy Margin Coverage) has been tested as non-noticible as compared to persisted tables
o If specific user access patterns demand better performance it can be address through an escalation of effort from indexes, materialized views and persisted tables with either full reload or ETL processes.
· Integrating Data Discovery
· Automating Persistent Stage
o Powerdesigner Meta Model
o Data Type Conversion
o Name Conversions
o Meta Data
o Logical References
o ETL SQL Script Generation
· Virtual Raw Data Mart
o Meta Data
o Logical References
· DDL Generation
· Mapping Exports for ETL
Please explain the problem statement, challenges , solutions and benefits in white paper format.
Note: Please do not repeat the points I brainstormed, they are for your understanding purpose.