Generational Changes to Enterprise Data Warehouse Architecture

I had been on the job only three weeks before I began work on my secret plans.  It started out as just a simple BUS matrix; but, very quickly became a drill down BUS matrix of epic proportions.  I knew from the onset if my schemas were ever to hit a database I would need to be clever, patient, and a maybe even a little cunning.  I would have to keep my cards close to my chest, revealing my intentions only in bits and easily digestible bytes, sometimes as if they were not even of my own creation.

Prior to this gig I had been working as a hired gun, traveling the nation helping those in need of business intelligence consulting.  My particular specialty was the model.  It was always about the model in my views.  Actually, I prefer a physical model where the views merely reflect those structures; but, I digress.  I don’t know how many times I have seen a solution that relies on massive contortions of MDX to accomplish what would have been more scalable and elegant by simply cooking up another fact table to support that specific type of analysis.  Working as a consultant was a lot of fun and extremely challenging.  I am grateful for the opportunity I had to really make a difference on so many teams and projects.  However, after some time, the length of the journeys, the hardships of the trail, and my longing to be with my family led me to become a FTE at a billion dollar conglomeration of almost twenty individual companies.

Being a full time employee was a completely different role in my case.  No longer was I being asked to evaluate existing architectures and recommend best practice approaches as my primary function.  Initially, my priorities were mainly focused on immediate business needs and technical challenges.  In the first few months there were several challenging tasks outside of my comfort zone.  I couldn’t even estimate the distance in which I managed to herd an entire group of databases of an instance from one piece of hardware to another.  I built the structures for a 3 tier SharePoint farm (including PowerPivot) from my own bare hands, eight fingers, and two thumbs.  I implemented the processes and structures required to track prices and at least five ways to measure cost historically only after an elusive hunt for the mysterious “actual actual cost.”

Through it all I quietly kept working on my master plan.  I planted the seeds for it’s growth by mentioning what I refer to as “Kimballisms” (most folks refer to them as “Design Tips”) every chance I got.    I recited them like a preacher’s boy recites passages from the Bible, especially the dimensional modelers mantra:   “Ease of use, Query Performance!”  At the water cooler, I’d say things like, “If only we could drill across via some sort of conformed dimension.”  During the U.S. presidential election I expressed my hope that Ralph Kimball would run.  I pondered aloud how the world would be different if Kimball had designed all of Apple’s products.  “Load data at it’s most atomic level!” I’d mutter to myself as I wandered through the hallways.  It even got to the point that I’d attribute almost any words of wisdom to the man, “Kimball says not to go in the water for at least 30 minutes after eating.”

One day, an opening appeared.  My manager, whom I’ll refer to as Mr. Scratch, asked me to saddle up and produce a prototype PerformancePoint dashboard that would combine information about sales and collections.  I rassled up a prototype lickity split; but, there was a big problem.  The billing information was sitting pretty  on a SQL Server Analysis Services database; but, the sales roster was in a completely different database.  Furthermore, variations on similar dimensions complicated the matter such that the prototype had to have two different drop downs for basically the same information.  In one, the clerk number was in front of the storekeeper’s name and in the other it was after the name!  This was the perfect opportunity to talk about conformed dimensions with Mr. Scratch!  Much to my surprise, he took to the idea like a cat to fish so much so that I think he thought it was his own.  I don’t want to downplay the significance of this event.  Many existing reports and user generated solutions would become obsolete and have to be rewritten.  This was no trivial endeavor to suggest.  My plan was taking shape!  The seeds had sprouted and the roots were taking hold.

I couldn’t reveal my entire strategy all at once.  I had to bide my time and proceed carefully.  The existing data warehouse was created through the massive efforts of an extremely effective albeit small team.  They had gone through the painful trials and tribulations of understanding disparate legacy source systems and reverse engineering their business logic through painful iterations involving buckets of blood, sweat, and tears.  They had engaged the business and delivered analytic capabilities that had been unimaginable prior to the data warehouse.  The corporate buy-in was tremendous and the team had become the champions of the organization long before I had arrived.  They had bravely fought the good fight and won the hearts and minds of the entire corporation.  As can be expected, any suggestion about improvements had to be made with cautious candor.

It’s not hard to be tall when you get to stand on the shoulders of giants.  Their existing solution was better than at least three quarters of all the solutions I had ever seen.  They had avoided many of the common pitfalls and gotchas of data warehouse and OLAP cube development.  They were properly managing a type two slowly changing dimension.  Their robust solution was working, reliable, trusted, and respected by organization.  Important decisions were made every day based on the data provided by the data warehouse.

Eventually, it came time to share my proposed BUS matrix and drill down BUS matrices to the data warehouse manager.  Mr. Scratch played the devil’s advocate like a well worn fiddle.  We sat down to let him throw rocks at my model and had a rather heated debate on the subject of snowflake vs. star schema designs as he played his fiddle.  My first attempt was to cite Kimball design tip #105, “Snowflakes, Outriggers, and Bridges.”  In this tip modelers are encouraged “to handle many to one hierarchical relationships in a single dimension rather than snowflaking.”  Next, I referenced Chris Adamson’s Star Schema Central blog article Star vs. Snowflake.  Mr. Scratch was well versed in the debate strategies for refuting expert testimony and would have none of it.  That man sure can play a fiddle.  By the end of the discussion, I was convinced I would have to work something up that would measurably prove once and for all that the star schema was better performing than a snowflake.  I tried; but, lets just say that the results were inconclusive.  I personally still stand by Kimball and Adamson and I will accept their expert testimony.  For more on this subject, please listen to or read the transcript of the recent SQL Down Under podcast with guest Erin Welker, “Dimensional Modeling.”  I don’t want to beat a dead horse here so on with the story.

The next day, my manager went down to Georgia, and the discussion was eventually forgotten.  He never held me to a snowflake design pattern and we ended up with a strict star schema.  So here we are today, months after that initial design session and one week after an intense closed beta period.  Tomorrow marks the beginning of open beta and in three to four weeks the new data warehouse and cube replaces the old.

Looking over the summary of changes as I create a PowerPivot presentation for my manager I can reflect on all of the important design decisions that were made over the past weeks.  Most in business intelligence are familiar with Kimball’s concept of iterative changes and a dimensional lifecycle where incremental improvements are implemented to the data warehouse on a periodic repeating pattern.  These can include additional fact tables to existing data marts, changes to dimensions, and additional calculated measures.  Although I can’t remember exactly where, I do remember reading that a data warehouse isn’t considered mature until it has under gone its third generation.  Generational change occurs when the solution is effectively torn down and rebuilt from scratch.  This process uncovers unknown bugs and issues that may have gone undetected for years.  It forces one to rethink everything that has gone before at least in respect from the ODS to the data warehouse in this situation.

On the relational data warehouse side we went from 54 tables to 30.  Tables are differentiated by schemas that represent the business process they model.  Conformed dimensions are placed in a COMMON schema.  This was achieved primarily through the consolidation of related dimensions as attributes into a main dimension.

On the Analysis Services side we started with four databases containing one or two cubes each derived from one fact table per cube.  Now we will have one analysis services database, one cube, and six measure groups (three of which provide brand new analytic capabilities.)  We reduced the number of dimensions by about half primarily through consolidation.  We added several new dimensions and attributes and we have added time intelligence capabilities.

image

Old version – Snowflake Schema Design

image

New version – Star Schema Design

List of Significant Architectural Design Changes

  • Consolidated header/detail fact tables into one by allocating header values to the lower granularity
  • Conformed common dimensions
  • Consolidated dimensions
  • Added a new type of fact table with data that was not available before, an accumulating snapshot to track inventory levels
  • Added a new fact table to track prices and costs of products historically (and all of the components that make up those calculations I might add.)
  • Added a new fact table to track goals to actuals
  • Added attributes to represent discretization buckets
  • Proper cased the data elements themselves (everything was in all caps from the legacy systems and it’s much more purdy now)
  • Several new behavioral attributes to existing dimensions

Next on the Agenda

  • Introduce page level database compression
  • Introduce relational partitioning
  • Implement automatic analysis services partitioning mirroring the relational partitioning
  • Installation of an n-tier SharePoint 2013 farm
  • Creation of PerformancePoint 2013 dashboards
  • Schedulable forecasting using the DMX data mining query language and SSIS
  • Migration to SQL Server 2012 Reporting Services running in SharePoint integrated mode
  • Take advantage of new capabilities in PowerView and PowerPivot

I hope you’ve enjoyed this campfire tale.  I thought it might be good to share.  I’m tired of typing.  I’ve got blisters on my fingers!  My digits are weary and I can no longer see the forest through the trees  Imagine as I ride my old mustang off into the sunset playing a harmonica.

I’ll leave you with a mangled version of a poem inspired by Robert Frost.

Stopping by Woods on a Snowflaky Schema

Whose woods these are I think I know

His house is in the village though.

He will not see me stopping the server

to watch his woods fill up with snowflake schemas.

My little horse must think it BI

to stop without a SharePoint farm house near

Between the woods and frozen lake

the darkest evening of the fiscal year.

He gives his default wav a beep

To ask if I meant that last delete.

The only other sound’s the sweep

of easy wind and downy flake.

The woods are lovely dark and deep.

But I have promises to keep.

And miles to go before I sleep.

Miles to go before I sleep.

Advertisements