If I had a nickel for every time I’ve heard from a business user that the OLAP solution must support current real time data from the source system, I’d have at least half a dollar. The fact is that in reality there is usually some acceptable degree of latency. Even though Microsoft SQL Server Analysis Services has a ROLAP storage mode option, I’ve typically tried to steer people towards the MOLAP option due to it’s query performance superiority.
This past week I’ve had the opportunity to explore some of the proactive caching options available in SSAS. I thought this would be a great topic for a blog post. I ran into a couple of stumbling blocks and perhaps someone else can benefit from my experience.
Proactive Caching is a feature in SSAS that allows you to specify when to process a measure group partition or dimension as the data in the relational data source changes much like a cache. A proactive caching partition or dimension is commonly referred to as a cache, even though it is still considered to be a partition or dimension. When Proactive Caching is implemented, SSAS will handle keeping the cube up to date on its own, per the parameters you specify. The alternative to Proactive Caching is to develop a SSIS package that processes the dimensions and measure group partitions; you would execute the SSIS package periodically.
Proactive Caching goes hand in hand with partitioning. So I started my research by revisiting that topic. I discovered an excellent blog post that included a very easy to follow tutorial on setting up dynamic partitioning using SSIS.
Dynamic Cube Partitioning in SSAS 2008
If you’d like to follow along, first go through that tutorial. One important note, in step 10 there is a missing step. You need to open the second script task in the editor, right click on the References folder in the project explorer, and select Analysis Management Objects. Save it and then click Build from the menu to compile the script.
In case that site goes dark or you just want to have a look at my completed solution I’ve uploaded a zip file with everything you need. DynamicCubePartitioningInSSAS2008.zip
Still with me? Great. Our first task is to test out SQL Server Notification.
SQL Server notification can only be used if the relational data source is a Microsoft SQL Server 2000 or later database. SQL Server raises trace events when data changes. In order to catch these trace events, SSAS must connect to the SQL Server with administrator rights. With SQL Server notifications, measure group partitions are always processed using Full Process which discards the contents of the partition and rebuilds it; dimensions are processed using Process Update which picks up inserts, updates, and deletes in the relational data source.
Open up SSMS and connect to your analysis services database. Click the plus signs until you get to the partitions created during that tutorial. Select the last one, right click and click Properties.
Select the Proactive Caching page and click the Options button. The General settings tab should look like the screenshot below.
Click the notifications tab, select the SQL Server notifications radio button, check the box next to “Specify tracking tables” and pick FactSales by clicking the ellipses and placing a check in the box next to that table. Your screen should look like the screenshot below.
Click two OK buttons to complete the operation.
Now, let’s get a baseline. Create a simple slice using your cube browser and note the Fact Sales Count for the month of March.
Now, execute the following SQL statement to insert some more data into the table.
|INSERT INTO FactSales SELECT D.DateKey,P.ProductKey,ISNULL(C.CustomerKey+2,P.ProductKey),P.Cost FROM DimProduct P LEFT JOIN DimCustomer C ON P.ProductKey-2 = C.CustomerKey+2 CROSS JOIN DimDate D WHERE CalendarMonth = 3; GO|
Go back to your cube browser and click the refresh icon. You should see that the count of records for March has doubled.
Let’s try the next notification option.
The Client Initiated notification is performed by sending a NotifyTableChange XMLA command to the SSAS server. For example an SSIS package that updates the data warehouse could use the Analysis Services Execute DDL Task to send the NotifyTableChange XMLA command to the SSAS server every time the data warehouse update processing is completed.
This is the option that gave me the most grief during my testing. Perhaps some of the XMLA code that I see proliferated around the web works for SSAS 2005; but, not SSAS 2008. I could not get the XMLA code referenced in blogs like this one http://www.mssqltips.com/sqlservertip/1563/how-to-implement-proactive-caching-in-sql-server-analysis-services-ssas/ to work.
I came across another example which ended up working; but, at first I was confused by the data source element and mistakenly thought it meant the one in my SSIS package when in fact it needs the data source name from your cube. I know, silly mistake. I’m blaming it on the cold medicine. Great for colds; bad for BI.
|<NotifyTableChange xmlns=”http://schemas.microsoft.com/analysisservices/2003/engine”> <TableNotifications> <TableNotification> <DbSchemaName>dbo</DbSchemaName> <DbTableName>FactSales</DbTableName> </TableNotification> </TableNotifications> </NotifyTableChange>|
To test this, right click on the partition again and get back to the notifications tab of the storage options dialog. Select the ‘Client initiated’ radio button, check the box next to ‘Specify tracking tables’ and select the FactSales table as before.
Create an SSIS package with a connection manager for your SSAS database and an Analysis Services Execute DDL task. Set the connection in the task to your SSAS connection manager and paste the XMLA code from above into the SourceDirect property. Run the SQL commands from above to insert more data into the fact table. Execute the package (by firing squad preferably.)
Once all is green, jump back to your cube browser and refresh. If all went as expected you should see the count for month three increase by a third.
Finally, let’s try my personal favorite and the last notification option.
The Scheduled polling mechanism provides the most flexibility by allowing you to either rebuild or incrementally update the cache. Scheduled polling simply queries the relational data source periodically to determine if the data has changed. Incremental updates improve proactive caching performance by reducing the amount of data that needs to be processed. For proactive caching partitions, incremental updates use a ProcessAdd to append new data to the cache. For proactive caching dimensions, a ProcessUpdate is performed. If you use Scheduled polling without incremental updates, the cache is always completely rebuilt.
The first thing we need to do is alter our FactSales table by adding a column to store the date the row was appended to the table.
|ALTER TABLE dbo.FactSalesADD LoadedDate DATETIME NOT NULL DEFAULT GETDATE();|
Get back to the Notifications tab in the storage options dialog and select the ‘Scheduled polling’ radio button. Check the box next to ‘Enable incremental updates’ and click the ellipses in the textbox under the ‘Polling Query’ column in the mini-table. Paste the following SQL statement into the ‘Create Polling Query’ dialog:
|SELECT MAX(LoadedDate) LoadedDate FROM dbo.FactSales;|
We could stop there; but, if we did SSAS would always do a full process of the partition whenever it detected a change. We can do better. If we enter a processing query, SSAS will perform a ProcessIncremental on the partition thereby reducing the amount of data that needs to be processed each time a change is detected. Click the ellipses in the text box under the ‘Processing Query’ column in the mini-table. Paste the following SQL statement into the ‘Create Processing Query’ dialog:
|SELECT DateKey, ProductKey, CustomerKey, SalesAmountFROM dbo.FactSales
WHERE LoadedDate > ? AND LoadedDate <= ?
Click OK twice to commit the changes, process your partition manually, and then run the SQL statement below to insert some more data into our table. (Note the additional GetDate() column I’ve added to the end of this)
|INSERT INTO FactSales SELECT D.DateKey,P.ProductKey,ISNULL(C.CustomerKey+2,P.ProductKey),P.Cost,GETDATE() FROM DimProduct P LEFT JOIN DimCustomer C ON P.ProductKey-2 = C.CustomerKey+2 CROSS JOIN DimDate D WHERE CalendarMonth = 3 GO|
Finally, jump back to your cube browser to refresh your view. Your count should increase for the third month.
Now I’d like to revisit the General tab of the storage options dialog and discuss what each of those options mean.
The Silence interval is the amount of time to wait after an update notification is received before processing the partition. This allows for other updates to occur before starting to process the partition.
The Silence override interval is the maximum time to wait after an update notification is received before processing the partition. Without the silence override interval you could conceivably have changes that keep occurring and the partition would never get processed.
Latency is the maximum amount of time that you want to use the MOLAP cache after changes have occurred in the relational data source and before/while the partition is processed. In other words you may continue to use the existing MOLAP cache for a period of time while it is being rebuilt. Once the latency is reached the MOLAP cache is dropped and queries can be satisfied by going back to the relational data source.
Bring online immediately (when checked) will satisfy queries from the relational data source while the MOLAP cache is being rebuilt.
Enable ROLAP aggregations (when checked) will create materialized views for aggregations in the relational data source.
The following diagram that I copied from the Analysis Services 2005 Performance Guide does a great job of explaining these settings with an example.
Still with me? Congratulations! You deserve a joke.
Q: Why does the OLAP developer hang his wallet above the fireplace?
A: He is warming the cache.
LOL. He could have just used SSIS.