# Analysis Services and Excel Tips and Tricks

I was speaking with a Business Analyst the other day and I couldn’t remember exactly how to do a cool trick I’d learned years ago working with Excel and Analysis Services.  It got me thinking about some other cool techniques that I suspect many do not know about.  If you’ve got a cool trick to share, please let us know in the comments.

# Trick #1 – Use OLAP PivotTable Extensions to Add Calculated Measures

I want to demonstrate how to add a new calculated measure to a pivot table using MDX.

## The Setup

2. Open Excel and connect to the AdventureWorks SSAS database.  (Data, From Other Sources, Analysis Services, etc.)
3. In the PivotTable Field list use the “Show fields related to:” drop down at the top select “Internet Sales.”
4. Add the Internet Sales Amount to the values
5. Add the Product Category user hierarchy to the rows and expand.
6. Add the Date.Fiscal Weeks User hierarchy to the Report Filter and select Week 4 FY 2008.
7. Right click a cell within the pivot table and select OLAP PivotTable Extensions.
8. Enter the following in the Calculation Name drop-down field:  [SPPY Internet Sales Amt]  (SPPY = Same Period Prior Year)
9. Enter the following in the formula text box: (ParallelPeriod([Date].[Fiscal].[Fiscal Year],1,[Date].[Fiscal].currentmember),[Measures].[Internet Sales Amount])
10. Click the “Add to PivotTable” button.
11. Place a check mark in the PivotTable Field list under the newly created “Values” measure group next to SPPY Internet Sales Amt.
12. Verify that those numbers match Week 4 FY 2007.
13. Remove the Date.Fiscal Weeks user hierarchy from the report filter.
14. Add the Date.Fiscal user hierarchy to the Report Filter and select July 2007.
15. Verify that those numbers match July 2006.

## Hungry for more?

There are lots of examples of useful MDX you can use in your spreadsheets!  A great graphical tool to help you build (and learn) complex MDX queries is Cube Player.

# Trick #2 – Filtering, the Easy Way!

Have you ever needed to analyze a particular set of products or customers based on a long list of customer names or USNs?  Going through the filter drop-down and individually clicking each one can be a tedious, time-consuming, and frustrating process.  If you’ve ever had to do this, I am about to become your best friend.

## The Setup

1. Open Excel and connect to the AdventureWorks SSAS database.  (Data, From Other Sources, Analysis Services, etc.)
2. In the PivotTable Field list use the “Show fields related to:” drop down at the top select “Internet Sales.”
3. Add the Internet Sales Amount to the values
4. Add Customer to the row labels
5. Highlight all the customer names that start with the first name “Michael” and copy them to your clipboard.
6. Right click a cell in the PivotTable and select OLAP PivotTable Extensions.
7. Click the Filter list tab and select [Customer].[Customer] from the filter hierarchy drop-down.
9. Click Filter PivotTable.
10. Add a comment to this blog post about how much you appreciate the time I spent writing this blog post.

## Trick #3 – Using SSAS Time Intelligence in Excel

For this example, I added time intelligence to the AdventureWorks cube by following the wizard.  If you don’t know if the cube at your company has time intelligence, look in the date dimension for an attribute named something like “Fiscal Date Calculations” or just ask somebody.  If your cube doesn’t have time intelligence, maybe you can convince them to add it.

## The Setup

1. Open Excel and connect to the AdventureWorks SSAS database.  (Data, From Other Sources, Analysis Services, etc.)
2. In the PivotTable Field list use the “Show fields related to:” drop down at the top select “Internet Sales.”
3. Add the Internet Order Count measure from the Internet Orders measure group to the values.
4. Add the Date.Fiscal user hierarchy to the report filter.
5. Add Fiscal Date Calculations to the rows.
6. Select November 2003 in  the Date.Fiscal report filter in cell B1.

Wow!

# Trick #4 – Formatting, As You Like It!

For this example, let’s create a nicely formatted re-usable Balance Sheet for Adventure Works.

## The Setup

1. Open Excel and connect to the AdventureWorks SSAS database.  (Data, From Other Sources, Analysis Services, etc.)
2. In the PivotTable Field list use the “Show fields related to:” drop down at the top select “Financial Reporting.”
3. Check the box next to the Amount metric in the Financial Reporting measure group.
4. Check the box next to Accounts in the Account dimension.
5. Click the plus sign next to the Fiscal display folder in the Date dimension and check the box next to the “Date.Fiscal” user hierarchy.
6. Click the drop down arrow in cell A2 and uncheck everything but Balance Sheet.
7. Click the drop down arrow in cell B1 and uncheck everything except FY 2006, FY 2007, and FY 2008.
8. Click the plus sign next to “Balance Sheet” in cell A3.  Click the plus sign on every plus you see until there are no longer any plus signs on the sheet.
9. (Optional sub-tip)  The CFO wants to see the GL account next to the lowest leaf level rollups.  Right click “Cash” and select “Show Properties in Report” and choose “Account Number.”

Your screen should look like this.

## What’s Wrong With This Picture?

This looks okay; but, it could certainly be better.  If you look for balance sheet examples on the internet you might notice that the totals are normally put underneath the category.  They also typically wouldn’t have a total column for the combined years aggregation.

First let’s do something about those totals.

1. Click on a cell in the pivot table.  This action adds the PivotTable Tools section on your Excel ribbon.
2. Click on the Design tab and click the Report Layout button third from the left on the ribbon.  Select “Show in Tabular Form.”
3. Click the Grand Totals button and select “Off for Rows and Columns.”

It is starting to look better; but, now if I want to insert a row underneath the receivables total I get a very big error message!

This is the trick.

1. Click the Options tab of the PivotTable Tools ribbon and click the OLAP Tools button fourth from the right.  Select “Convert to Formulas.”
2. You’ll get an error.

To resolve the error:

1. Click the filter icon in cell A2 and select all.
2. Click the filter icon in cell G1 and select all
3. Save a copy of your Balance Sheet with a file name such as “Dynamic Balance Sheet.XLSX” (I’ll explain why shortly.)
4. Click the OLAP Tools button and select “Convert to OLAP formulas” again.

Continuing our beautification…

1. Insert a row underneath “Receivables Total.”
2. Insert rows underneath every total.
3. Insert a column between FY 2007 and FY 2008.
4. Label the column “YOY Var %” in cell I2
5. Add the formula: =(H6-G6)/G6 and copy it down the balance sheet
6. Format the column I as a percent
7. Hide the rows 49 – 51
8. Make the totals bold
9. Hide column A
10. Hide Row 1, Bold and Center Row 2.
11. Highlight cells B2-F2 and press delete
12. Highlight columns B-F and select Format,  Auto Fit Column Width and make bold.
13. Copy column I to (the new) Column M
14. Modify the formula in M3 so that it reads: =(J3-H3)/H3
15. Hide columns I and M.  I just wanted to show you how easy it was to add a formula.
16. Add top and bottom borders for each of the sub category total amounts, top and thick bottom borders to Current Assets, Property, Plant, Equipment, Liabilities, and Owners equity totals, and finally top and double bottom borders for Assets and Liabilities and Owners Equity Totals.
17. Select columns G-J, right click a cell, select Format cells and decrease the precision so that there are no cents displayed.
18. From the view tab uncheck the Gridlines checkbox.
19. Print.
20. Get some coffee on your way to the printer.

You’re spreadsheet should look like this:

There is one caveat you need to know.  If accounting adds a new GL rollup it will not automatically appear on the balance sheet.  You can manually add it if you’re careful or open the “Balance Sheet Dynamic” version and re-apply the formatting.

——————————————————————————-

That’s all for today, folks!  Please add to the comments if you would like to share a great Analysis Services in Excel tip or trick.

# Dangers in Dashboard Design

Dangerous Dashboard Design

I recently had the great pleasure of working on a dashboard project using PerformancePoint in SharePoint 2013 with a great team of individuals.  We had to overcome several obstacles during the process and I thought it would be a good idea to document how we accomplished some of our goals with the hope that I can spare someone else (or my future self) from having to go through the same pain.  Our dashboard is currently in closed beta and we will be releasing it to the wild in the next couple of weeks.

Dynamic Active Directory Security Without Kerberos

The biggest challenge we faced was how to implement dynamic active directory security without using Kerberos.  I had applied dynamic security to several Analysis Services databases in the past; but, had never followed that implementation through to PerformancePoint dashboards.  This challenge is more difficult in that respect because within a dashboard you will typically have more sources of data than just an analysis services database.  We had SSRS reports that ran against our ODS and our relational data warehouse.  You can read more about how we achieved this in my recent blog post, PerformancePoint 2013 Dashboards: Applying Dynamic Security using CustomData and EffectiveUserName.

Our solution required three roles.  Two roles for the managers who would be restricted to seeing only their employees.  One role for the PerformancePoint objects which uses the CustomData function in the security and another for the SSRS MDX based reports and Excel Services based reports that uses the UserName function.   Finally we needed a separate unrestricted role for the managers and district leaders that could see all of the data.  This allowed us to keep our bridge table relatively small.  Otherwise every person that needed to see every person would require about 2,500 rows in the bridge table.

During develoment, I mistakenly thought we could create the entire dashboard for the restricted users, save a copy and rename it to unrestricted, change the data source to use the unrestricted role and be on our way.  I was woefully mistaken.  Apparently PerformancePoint uses some sort of invisible GUID or something for the data source name.  Worse yet, you can’t change the data source for existing Analytic charts, grids, and scorecards.  So we couldn’t make a copy of the objects, rename them, and point them to a different data source.  We actually had to use two different site collections for the restricted and the unrestricted dashboards!

I Know Who You Are and What You Did Last Quarter

Once we had the dynamic security in place we wanted the dashboard to automatically know who you are.  We accomplished this by adding an attribute to our person dimension that would hold the active directory account name.  I use the word person in it’s most generic sense.  A person can be a vendor, customer, employee, manager, or sales person.  I created a MDX filter in PerformancePoint and used the following MDX code to achieve this:

STRTOSET(“[People].[Active Directory Login].&[” + CUSTOMDATA() + “]”)

I then added this attribute to any analytic charts, grids, or scorecards in the background quadrant of the design canvas so that the report could be filtered by it.  I connected the filter to the report as one typically would and later hid the filter using the edit web part interface within the SharePoint 2013 site.

Save the Trees

My good buddy Michael Simon used some fancy MDX to set the trailing six quarters for an analytic bar chart only to discover that we lost the ability to right click and get the decomposition tree and all the other neat PerformancePoint features.  He discovered a clever work around using dynamic named sets.  We had our right clicks back.

Yesterday… Not Such an Easy Game to Play

Another challenge we faced is a very common one.  How to get the dashboard to automatically present data as of yesterday.  Typically this wouldn’t be a big deal; but, in our case we use a reporting calendar that doesn’t include Saturdays, Sundays or Holidays.  Any sales that occur on those days roll forward to the next business day.  If you don’t have such a dilemma you can use either of these two snippets to accomplish this feat:

1. STRTOSET(“[Date].[Calendar].[Date].&[“+ VBAMDX.Format(VBAMDX.Now()-1, “yyyyMMdd”) +”]”)

2. STRTOSET(“[Date].[Calendar].[Date].&[“+ VBAMDX.Format(VBAMDX.Now(), “yyyyMMdd”) +”].Lag(1)”)

In our case, this wouldn’t work.  If the user wanted to look at the dashboard on a weekend or a holiday they would get an error.

I initially tried to set the default member of the dimension to yesterday using MDX.  I ran into a problem in that once I set the default member for the date dimension in the cube I could no longer put the date dimension in the report filter area of an Excel Pivot table.  It would no longer work and would just show yesterday’s data.  See this link for more information about that issue:  Default Member Trap (be careful of what you filter)

My next strategy was to add an attribute to our Gregorian date dimension called Last Business Date and update the corresponding relational table with this information.  I then attempted to parse together the information to get the correct reporting date.  We have two date dimensions in our cube.  One is a regular calendar date dimension, the other one is a fiscal reporting date dimension that doesn’t include weekends or holidays.  Our fact table has keys for both.  You can see my post asking for help from the BIDN community here:

MDX Help Please – The + function expects a tuple set expression for the 1 argument. A string or numeric expression was used

Before I received the kind reply from mreedsqlbi, I found another way to accomplish what I needed.  This post helped me find another solution:  Member Does Not Exist in MDX Calculations

I created four dynamic named sets in the cube called, OneDayAgo, TwoDaysAgo, ThreeDaysAgo, and FourDaysAgo.

Here is the code for OneDayAgo:

iif(iserror(STRTOMEMBER(“[Date – Reporting].[Date].&[” + VBAMDX.Format(VBAMDX.Now()-1, “yyyyMMdd”) +”]”)),{},STRTOMEMBER(“[Date – Reporting].[Date].&[” + VBAMDX.Format(VBAMDX.Now()-1, “yyyyMMdd”) +”]”))

For each of the other named sets, I just changed the –1 to a –2, –3, and –4 respectively.

The code for my MDX filter used the dynamic named sets to figure out if the member exists:

``` IIF([OneDayAgo].Count > 0             ,STRTOMEMBER("[Date - Reporting].[Date].&[" + VBAMDX.Format(VBAMDX.Now()-1, "yyyyMMdd") +"]")             ,IIF([TwoDaysAgo].Count > 0                   ,STRTOMEMBER("[Date - Reporting].[Date].&[" + VBAMDX.Format(VBAMDX.Now()-2, "yyyyMMdd") +"]")                   ,IIF([ThreeDaysAgo].Count > 0                         ,STRTOMEMBER("[Date - Reporting].[Date].&[" + VBAMDX.Format(VBAMDX.Now()-3, "yyyyMMdd") +"]")                         ,IIF([FourDaysAgo].Count > 0                             ,STRTOMEMBER("[Date - Reporting].[Date].&[" + VBAMDX.Format(VBAMDX.Now()-4, "yyyyMMdd") +"]")                             ,STRTOMEMBER("[Date - Reporting].[Date].&[" + VBAMDX.Format(VBAMDX.Now()-5, "yyyyMMdd") +"]")                         )                     )                 )             ) ```

UPDATE:

This works better because it works in Dev which doesn’t always have data for yesterday.

TopCount(ORDER( NONEMPTY( {[Date – Reporting].[Date].Members

– [Date – Reporting].[Date].[All]}

, [Measures].[Sales Amt] )

, [Date – Reporting].[Date].CurrentMember.MEMBER_KEY

, DESC )

,1)

UPDATE:

At one point during Beta testing, I was asked to present the user with a date dropdown that defaults to yesterday.  This was later scrapped because it required a server wide change within the Central Administration Application Settings for PerformancePoint in which we had to set the user selected filters to expire every day.

For anyone who is interested, I got that to work by removing the TOPCOUNT from the MDX formula filter in Dashboard Designer:

ORDER( NONEMPTY( {[Date – Reporting].[Date].Members – [Date – Reporting].[Date].[All]},
[Measures].[Sales Amt] ), [Date – Reporting].[Date].CurrentMember.MEMBER_KEY, DESC )

This only shows dates that have data, excludes the all member, and sorts it in descending order.  The result is that it defaults to yesterday but allows you to change it.

UPDATE

I think I’m crazy; but, I could have sworn the dashboard was showing Friday’s data on Monday.  Looking at the code, I don’t see how; but, I am amazed no one has pointed it out.

I had to modify the code to eliminate Today’s date from the data set.

TopCount
(
Order
(
NonEmpty
(
{
[Date – Reporting].[Date].MEMBERS
– [Date – Reporting].[Date].[All]
– STRTOMEMBER(“[Date – Reporting].[Date].&[” + VBAMDX.Format(VBAMDX.Now(), “yyyyMMdd”) +”]”)
}
,[Measures].[Sales Amt]
)
,[Date – Reporting].[Date].CurrentMember.Member_Key
,DESC
)
,1
)

Filters

Another challenge we faced was how to have MDX filters control relational reports.  This was pretty easy.  Just use the DisplayName attribute of the filter instead of MemberUnique and let the report figure out how to use that information.  We were unable to use multi-select MDX member selection filters.  When we passed these as DisplayValue to our relational SSRS reports it would only pass the first selection.  We were unable to overcome this limitation the way we would have liked.  We ended up not using multi-select filters and instead had filters that only contained the parent level of the hierarchy.  In other words, our filter contained managers and the report would display the managers employees.  We lost the ability to have the user only select some employees of that manager; but, not others.  To my knowledge, there is not a way to pass MDX multi-value filters using the DisplayValue property to relational SSRS reports.

Staying On Top of Reporting Services

One aggravation we experienced was that the column headers on the SSRS reports would not stay at the top of the web part when the user scrolled through the report part.  There are many blog posts out there on how to achieve this; but, within SharePoint 2013 it would work in Firefox and Chrome; but, not in Internet Exploder (9 or 10.)  I had to workaround this issue by sizing the report so it wouldn’t scroll, and instead display the toolbar and have the user page through the data instead of scroll.  Which in my opinion, looked nicer anyway.

I’ve got to mention one gotcha we experienced during our dashboard design endeavor that I had experienced many years ago.  I thought I had blogged about it in my What’s New BI-wise section; but, I couldn’t find it so I must have dreamed I had blogged about it.  Any way here are some words of wisdom:

When you are creating an SSRS MDX based report and need to make a customization to the underlying MDX that is created using the easy to use drag and drop interface, make a backup or better yet, check it into source control and label it ‘Pre-MDX customizations’ or something very clear like that.

Once you have made those customizations to the MDX (to allow that member property to be displayed for example) you can’t go back!  No longer can you simply delete all your parameters and add different ones easily.  Everything has to be done through hand coded MDX from that point forward.  It is far easier to design your report as much as possible, make a backup or labeled source control check-in, and then make your MDX customizations.  If you need to add/remove fields or parameters in the future it is easier to go back to that pre-MDX customized version and do it using the interface and then re-do the MDX customization than to work with it outside the interface.

Scorecards Don’t Always Score

Scorecards are really cool gadgets to use in PerformancePoint dashboards because you can use them to control other objects.  Filters and scorecards are the only way to control objects in the PerformancePoint dashboard.  When you see a scorecard functioning well inside a PerformancePoint dashboard it looks really cool.  As you click on the various members within the scorecard you can have charts and data change according to the context.

We could not use scorecards the way we wanted because we simply had too many members in our dimension for them to render quickly enough.  They don’t work as you would expect.  Doing the same type of thing in an analytic grid or Excel services report is a piece of cake and renders very fast.  For whatever reason, regardless of the fact that I filtered the scorecard on the manager, it would insist on rendering every employee in the company and then hiding the ones that weren’t children of the selected manager.  In our case that was a little less than 5,000 members.  I reduced this set to about 2,800 by deleting employees that weren’t used as keys in any of our fact tables; but, it was too late.  The powers that be had already decided that scorecards stink due to not only the rendering speed; but, the hoops we had to jump through to develop one using that many members.  Even in the developer’s IDE, Dashboard Designer, it would pull back every single representative regardless of being filtered on a single parent member.  It would display those in a red font indicating that they would be hidden; but, the time it took the designer to render made you think it had crashed.  We actually had to build a cube containing a very small subset of the data in order to design the scorecard the way we wanted.

From a pure developer’s standpoint it seems that the scorecard design process in PerformancePoint needs some maturing.  Some configurations that are set within the wizard for creating an Analysis Services based scorecard are simply not editable outside of the wizard.  One would need to take screenshots of how the scorecard was developed to document the process in the event that a change was ever needed to one of those initial configurations.

While I’m on the subject of scorecards, another gripe we had about the way scorecards work is that any time you have more than one set of KPI actuals and targets within a scorecard it displays them with all of the actuals on the left and trends or targets on the right.  In other words, if I have a sales actual and goal KPI on the same scorecard as a gross profit actual and goal, I can only show them as actual sales, actual gross proft, target sales, target gross profit.  I would like some control to show the targets next to their corresponding actuals.

Easy Migrations

One new feature of SharePoint 2013 that I really like is the new way it handles exporting and importing a dashboard project.  In previous versions of SharePoint, I had to create long lists of customizations that had to be done to the web parts after Dashboard Designer had deployed the dashboard.  Things such as hiding filters, resizing objects, and changing the chrome settings had to be done over and over again every time we made a change to the dashboard that required a re-deployment.  Moving the project from the development environment to the QA environment and finally to the production environment was a manual tedious process.  Furthermore, these customizations were not stored in source control.  I was pleased as punch to discover the export/import feature of SharePoint 2013 solves both problems.  I can extract the resulting .CMP file using a cab extractor and put the entire project into source control and I don’t have to repeat those customizations over and over again.

Accepting the Unacceptable

I’ve been creating dashboards since PerformancePoint 2007.  Make no mistake, I do not consider myself an expert at dashboard design.  I would however consider myself pretty advanced in using the Microsoft BI stack to produce great dashboards quickly.  I really like using PerformancePoint.  If you are willing to accept some of the quirks w/ PerformancePoint, it is a great tool to allow a developer to very quickly produce meaningful dashboards with a lot of functionality right out of the box.  When you start getting very particular about the way things look it gets more complicated.  For example, I can only show the legend at the top or to the right of my analytic charts.  I can’t show them on the left and I can’t show them on the bottom.  I can’t control the color of the bars in the bar charts.  I sometimes joke that designing PerformancePoint dashboards is a great opportunity for someone who likes to say no.  Sure, we can give you a bar chart w/ the colors you want or with the legend on the left; but, I have to do that in Reporting Services and we lose all of the cool right click functionality that is included in the analytic chart such as the decomposition tree.

Fifty Shades Of Gray Is Not Enough

I’ll never forget one project I worked on as a consultant with a very lovely lady who had designed her dashboard on paper before she had any idea of what PerformancePoint could and could not do easily.  A simple thing like showing the total dollar amount at the top of the bar chart involved creating a separate reporting services web part that only showed that total and placing it above the chart. The end result was a really good looking dashboard that looked remarkably similar to what had been envisioned.  It just took a lot of extra time to make it look exactly how it was imagined.

One other piece of advice I can give is to apply a lot of rigor in your testing.  Make sure each page of the dashboard uses consistent nomenclature.  Make sure that the numbers on all pages agree with each other.  If one page says that gross profit is this and another says gross profit was that, you’re going to get a lot of questions from your user community.  Be consistent in your color and number formatting as well.  Are you going to display zeros or leave the cells blank?  I personally prefer to leave the cells blank; but, it is more important that whatever you decide you stay consistent.  Decide beforehand what screen resolution you are designing to.  If everyone at your company has widescreen 17 inch monitors than it might be best to design your dashboard to look best at 1920×1080.  If you need to show your dashboard on an iPad or a Surface than you better have one lying around.  Good luck with that by the way.  Consider which browser is your company’s standard.  HTML5 or not, things look different in Chrome, Firefox, Safari and Internet Exploder versions 8, 9, 10, or 11.

Awwww…

As I wrap this up, I just want to give a big warm thank you to the other members of my team, Sherri McDonald and Michael Simon.  A lot of hard work and long hours went into this project and it feels great to have professionals like them to ease the burden.  It’s a much better experience than working solo.  Three heads are much better than one and it feels great to have others throw rocks at your stuff and see where it breaks before the real critics get their hands on it.

Do you have a good dashboard design story?  Did I leave a danger out?  Please let us know in the comments.

——————————————————————————-

Other Dashboarding Resources

One person that I would consider an expert on dashboard design is Stephen Few.  I recently read on his blog that a second edition of his book, Information Dashboard Design: The Effective Visual Communication of Data, is being released shortly.  He mentioned in that post that one of the criticisms he received on the first edition that is addressed in the second edition is the lack of real examples of effective dashboards by real products.  His explanation is that at that time there simply weren’t any products released that were capable of producing what he considered really good dashboard designs.  Be sure to read his article Common Pitfalls in Dashboard Design for a great list of dashboard design gotchas.

PerformancePoint 2013 Dashboards: Applying Dynamic Security using CustomData and EffectiveUserName

What are Web Parts?

PerformancePoint – Effectively Disable Analytic Grid Right Click Select Measures Functionality

Time Intelligence Filters in PerformancePoint 2010

Cascading Filters in Performance Point Services dashboard using SharePoint 2013

Performance Point Relative Date Time Intelligence with current date time

How to make SSRS reports in a PerformancePoint dashboard ‘pop out’ or open in a new window.

Clutter, data overload put dashboard designs on path to failure

Data Visualization and Dashboard Design

Using SSAS MDX Calculation Color Expressions

Time Intelligence Post Formula Filters

Set Default Value of PerformancePoint Filter – Note:  Caveat to this tip- Once a user changes the default value SharePoint will remember the changed value and not reset the default value.

Add conditional formatting to your PerformancePoint Services Analytic Grid, by defining a custom Scope calculation from within Analysis Services (SSAS)

Filter Connection Quick Reference

——————————————————————————-

All my PerformancePoint links – Updated as I update them!

——————————————————————————-

PerformancePoint Gotcha!  – “PerformancePoint Services could not connect to the specified data source.”  When creating a new dashboard, you MUST do a save all right after you create your data sources otherwise you won’t be able to create anything that uses that data source.

——————————————————————————-

Gotcha! PerformancePoint Time Intelligence and SSRS

I was trying to use a PerformancePoint time intelligence filter to pass values to a SSRS report. I discovered that the MonthToDate, YearToDate, QuarterToDate syntax does not work with SSRS. Instead use Month.FirstDay:Day, Year:FirstDay:Day, Quarter.FirstDay:Day

——————————————————————————-

Zero to Dashboard- Intro to PerformancePoint
Come check out the PerformancePoint hotness! Mike will demonstrate the functionality in PerformancePoint services in SharePoint 2010 and show you how to quickly build some dynamic dashboards and reporting functionality your end users will crave.

http://pragmaticworks.com/Resources/webinars/WebinarSummary.aspx?ResourceID=327

——————————————————————————-

PerformancePoint Tip – Right clicking an analytic grid in the leftmost columns allows the user to select measures.  Some people may not want that, so how do you disable that ability?  PerformancePoint Services Application Settings – Select Measures Control – set maximum value to 0.  (default is 1000).

http://technet.microsoft.com/en-us/library/ee620542.aspx

——————————————————————————-

# SSAS MSMDSRV.INI Settings for Multi-User Environments

Here are excerpts from the SSAS Operations Guide for SQL Server 2008 R2 regarding the recommended settings for a multi-user environment.  If you want more information you must read the whitepaper.

——————————————————————————-

2.3.2.4    Heap Settings, DataStorePageSize, and DataStoreHashPageSize
During query execution, Analysis Services generally touches a lot of data and performs many memory allocations. Analysis Services has historically relied on its own heap implementation to achieve the best performance. However, since Windows Server 2003, advances in the Windows Server operating system mean that memory can now be managed more efficiently by the operating system. This turns out to be especially true for multi-user scenarios. Because of this, servers that have many users should generally apply the following changes to the msmdsrv.ini file.

 Setting Default Multi-user, faster heap 1 2 1 0

It is also possible to increase the page size that is used for managing the hash tables Analysis Services uses to look up values. Especially on modern hardware, we have seen the following change yield a significant increase in throughput during query execution.

 Setting Default Bigger pages 8192 65536 8192 65536

——————————————————————————-

2.3.2.5    Clean Up Idle Sessions
Client tools may not always clean up sessions opened on the server. The memory consumed by active sessions is non-shrinkable, and hence is not returned the Analysis Services or operating system process for other purposes. After a session has been idle for some time, Analysis Services considers the session expired and move the memory used to the shrinkable memory space. But the session still consumes memory until it is cleaned up.
Because idle sessions consume valuable memory, you may want to configure the server to be more aggressive about cleaning up idle sessions. There are some msmdsrv.ini settings that control how Analysis Services behaves with respect to idle sessions. Note that a value of zero for any of the following settings means that the sessions or connection is kept alive indefinitely.

 Setting Description This is the minimum amount of time a session is allowed to be idle before Analysis Services considers it ready to destroy. Sessions are destroyed only if there is memory pressure. This is the time after which the server forcibly destroys the session, regardless of memory pressure. This is the timeout that controls sessions that no longer have a connection associated with them. Examples of these are users running a query and then disconnecting from the server. This timeout controls how long a connection can be kept open and idle until Analysis Services destroys it. Note that if the connection has no active sessions, MaxIdleSessionTimeout eventually marks the session for cleaning and the connection is cleaned with it.

If your server is under memory pressure and has many users connected, but few executing queries, you should consider lowering MinIdleSessionTimeOut and IdleOrphanSessionTimeout to clean up idle sessions faster.

——————————————————————————-

2.4.3    Multi-User Process Pool Settings
In multi-user scenarios, long-running queries can starve other queries; specifically they can consume all available threads, blocking execution of other queries until the longer-running queries complete.
You can reduce the aggressiveness of how each coordinator job allocates threads per segment by modifying CoordinatorQueryBalancingFactor and CoordinatorQueryBoostPriorityLevel as follows.

 Setting Default Multi-user nonblocking settings CoordinatorQueryBalancingFactor -1 1 CoordinatorQueryBoostPriorityLeve 3 0

——————————————————————————-

If your SSAS instance shares a server with anything else you may also want to read my good buddy Jorge’s blog post: Setting Memory Limits

——————————————————————————-

Did I miss any?  Do you have any additional tips to share?  If so, please sound off in the comments.

# What are Web Parts?

The term “web parts” can mean different things to different people.  A simple web search for “Microsoft web parts” returns some interesting results.  My introduction to web parts began with SQL Server 2008 R2 Reporting Services and the introduction of the report viewer web part (not to be confused w/ the report part.)  This allowed you to embed reporting services .RDL files within a PerformancePoint 2007 dashboard.  I believe you could also embed a report in any SharePoint page using the add web part feature.

A Developer’s Introduction to Web Parts (pertaining to SharePoint 2003) states: “Developers can build Web Parts as ASP.NET custom controls. Administrators can install Web Parts on any site based on Windows SharePoint Services. Users can add Web Parts to pages by dragging and dropping in a browser, and they can personalize them by setting properties. Web Parts can connect to other Web Parts using standard interfaces.”

The Introduction to Customizing Pages by Using Web Parts (applies to SharePoint Server 2007) provides an excellent overview of Web Parts and Web Part Pages:

——————————————————————————-

A Web Part is a modular unit of information that forms the basic building block of a Web Part Page. You can add Web Parts to Web Part zones in a Web Part Page and then customize the individual Web Parts to create a unique page for your site users.

The following example uses the Image Web Part to describe the basic features of a Web Part.

The Web Part title bar contains the heading for the Web Part.

The Web Part menu contains functions that enable you to minimize or close the Web Part, edit the Web Part, or get Help for a specific Web Part. When the page is in edit mode, you can also use this menu to delete the Web Part or connect it to other Web Parts, depending on the type of Web Part that you are using.

The body of the Web Part contains the content that you specified for the type of Web Part that is being used. In this example, the Web Part is an Image Web Part, which displays an image.

A Web Part Page is a special type of Web page in which you can use Web Parts to consolidate data, such as lists and charts, and Web content, such as text and images, into a dynamic information portal that is built around a common task or special interest.

——————————————————————————-

Finally, WikePedia defines a web part this way:

——————————————————————————-

A Web Part, also called a Web Widget, is an ASP.NET server control which is added to a Web Part Zone on Web Part Pages by users at run time. The controls enable end users to modify the content, appearance, and behavior of Web pages directly from a browser. It can be put into certain places in a web page by end users, after development by a programmer.

Web Parts can be used as an add-on ASP.NET technology to Windows SharePoint Services.

——————————————————————————-

Web Parts can be custom created for your needs or purchased through third party vendors.

Here are some Free Microsoft SharePoint Web Parts.

As a PerformancePoint dashboard designer, the web part customization is done after the dashboard has been deployed.  The frustration for me as a developer is that the customization tasks must be repeated every time the dashboard is re-deployed.  I’ve had to create long text cheat sheets of the various web part height, width, and chrome settings for a dashboard.  If anything needs changed to the dashboard using Dashboard Designer that causes you to have to re-deploy, these needs to be done individually by hand.  As far as I know, there is still no way to automate this task.  This always baffled me since these configurations can’t really be stored in source control and can’t be automated as the solution is deployed from dev to QA to production.

UPDATE!  SharePoint 2013 allows for easy migrations!

One new feature of SharePoint 2013 that I really like is the new way it handles exporting and importing a dashboard project.  In previous versions of SharePoint, I had to create long lists of customizations that had to be done to the web parts after Dashboard Designer had deployed the dashboard.  Things such as hiding filters, resizing objects, and changing the chrome settings had to be done over and over again every time we made a change to the dashboard that required a re-deployment.  Moving the project from the development environment to the QA environment and finally to the production environment was a manual tedious process.  Furthermore, these customizations were not stored in source control.  I was pleased as punch to discover the export/import feature of SharePoint 2013 solves both problems.  I can extract the resulting .CMP file using a cab extractor and put the entire project into source control and I don’t have to repeat those customizations over and over again.

# PerformancePoint 2013 Dashboards: Applying Dynamic Security using CustomData and EffectiveUserName

I recently had the great pleasure of solving a PerformancePoint 2013 puzzle and I thought I’d share how I did it to help others and as a reference for my fellow colleagues.  Our team was trying to implement dynamic active directory security for PerformancePoint 2013 dashboards.  A variety of PerformancePoint web parts were tested as part of this endeavor including Analysis Services scorecards, analytic charts and grids, SSRS reports (native mode), and Excel Services.

Devin Knight does an excellent job of explaining how to get this to work in his blogpost:  Analysis Services Dynamic Security.

For this method to work with PerformancePoint and Excel Services we will need two Roles:  ReadUserName and ReadCustomData.  The first one, ReadUserName is set up exactly how Devin describes.  This will be the role that Excel and SSRS will use.  The second one, ReadCustomData is the same except you need to replace the function UserName() with CustomData().  This ReadCustomData role will be the role that is referenced in the data source editor of PerformancePoint dashboard designer.

To test this I had to make sure that I wasn’t in any other AD groups or roles.  I only had to add myself to the ReadUserName role.  I did not have to be in the ReadCustomData role, only the service account that runs the SharePoint 2013 PerformancePoint content (a.k.a. Proxy.)  We also have a ReadUnrestricted Role.  I had to make sure I wasn’t in that role.

Here is another link, that describes how to get this working with PerformancePoint 2010:  How to configure dynamic security in Analysis Services and Performance Point 2010 without Kerberos.

Here is a link that describes various methods for testing if the security is working correctly:

#### Unit Testing Role Security in Analysis Services.

The series of posts below are wonderful!  I won’t repeat what is said there; but, I will say I was relieved to see that this person also came to the conclusion that two roles would be needed.

http://dbaexperience.com/2013/01/24/sharepoint-2010-sql-server-2012-bi-dynamic-security-part-1/

http://dbaexperience.com/2013/01/31/sharepoint-2010-sql-server-2012-bi-dynamic-security-part-2/

http://dbaexperience.com/2013/03/12/sharepoint-2010-sql-server-2012-bi-dynamic-security-part-3/

Here is where things got interesting… No, really.

This post:  Using EffectiveUserName To Impersonate in SSAS mentions that you have to set up Kerberos on your SSAS server to get this to work.  I can tell you we got it to work without that.  I did set up the data source in SSRS as this person describes.  Credentials stored securely in the report server, use as windows credentials, Impersonate the authenticated user after a connection has been made to the data source.

I got a little lucky (ok, maybe a lot lucky) in that my account had certain permissions that no one else in my active directory group had.  If it weren’t for that, I don’t know if I would have ever gotten any of this working.

Our initial goal was to set these permissions on either the SharePoint proxy account or the domain service account that process the cube.  That worked for the SSRS MDX reports; but, the Excel Services dashboard pages quit working.  Currently we are faced with the possibility that these permissions will need to be set for all the dashboard users and we are investigating what that actually means.  (http://windowsitpro.com/active-directory/hiding-data-active-directory-part-3-enabling-list-object-mode-forest)

We think the List contents is the one that made the difference.  My Active Directory sent me these links that he felt describe what is going on pretty well.  Here is what he said:

‘The link below explains the end result as the Account authenticating for the SSRS reports needs Read access to the TokenGroupGlobalAndUniversal attribute. This is done by adding the required account to the Windows Authorization Access  group in Active Directory.

Originally it was thought the SharePointProxy account was the account being used. I did add it to the group last Saturday and it did not resolve the issue. But today it was identified that the service account that connects to the data warehouse for cube processing was the account needing to verify the users group memberships to present the reports based on this information to the clients.

I also confirmed this to be the attribute by allowing Read access to all the User Object attributes and removing them until this single attribute was remaining. Removing this last attribute denied the reports with the Unknown username or password error.’

http://sqlinsight.net/blog/ssrs/ssrs-subscriptions-not-working/

http://blogs.msdn.com/b/richin/archive/2008/08/27/ssrs-subscription-tggau-rights-authzinitializecontextfromsid-exceptions.aspx

The other piece that seemed to make a difference was that the SSAS service is running under the NT AUTHORITY\NetworkService account.

Post in the comments if you have questions.

# SSAS Gotcha: The Operation Has Been Cancelled Due To Memory Pressure

My team and I recently had an issue with our SSAS instance in which users were receiving the following message from Excel:

Data Could not be retrieved from the external data source.  Error message returned by the external data source:

Server: The operation has been cancelled due to memory pressure.

Ok

Another user reporting a different error from Excel when trying to create a new connection to the Analysis Services database:

An error has occurred in the transport layer.

Errors in the OLE DB provider.  An error occurred while loading the connection dialog box components for prompting.

The following blog posts helped us identify and kill the MDX query and the user was executed:

Finding and killing SSAS Sessions

Kill a Session, SPID or Connection to Analysis Service Cube

The problem seemed to be caused by poorly constructed Excel pivot tables.  Users can do some strange things.  Things like dragging product numbers to the columns and product descriptions to the rows.  Regardless, a single user shouldn’t be able to bring down an entire server making it useless to anyone else in the company, right?

For the short term we had to find a way to automatically kill queries like these automatically.  Otherwise, our team could never go to lunch together.  Someone would always have to stay behind to log the locked query and restart the SSAS service if the server spun out of control.

Luckily, we were able to reproduce the condition by simply executing one of the queries we had captured on our QA server.  We tried a SSIS package that Chris Webb wrote to automatically kill locked sessions.  Unfortunately, when the issue occurred, the SSIS package couldn’t get a connection to check for locks let alone kill them.  I also played with the Server Timeout advanced property of the SSAS instance; but, that would only work reliably if we set it to a very small time period, such as 30 seconds.  Otherwise, it would blow right past the time out and never get cancelled.  I also got the PowerShell version of Chris Webb’s method to work; but, again it would only work reliably for short time periods.

*Make sure the user executing the Agent job is a SERVER administrator to get the PowerShell solution to work.  Only a WINDOWS SERVER admin can issue cancel XMLA requests.

Killing sessions automatically with SSIS

Dealing with Long Running SSAS Queries using Powershell

After more research and testing, I found this snippet:

Check the CancelCoordinatorCount property, you can decrease a bit from its default value, which will force the engine to check the IsCancelled property more often. As a result, the queries will be stopped faster. We had big problems with that, but after some post SP2 updated + with using that CancelCoordinatorCount thighs got much better.

I changed the CancelCoordinatorCount property from 1000 to 100 and the PowerShell solution would work with a slightly larger threshold.  Coupled w/ a larger failsafe threshold on the Server Timeout property (which also started working) I was getting closer to what I wanted.

Something about that post triggered a memory and a promise to myself I had made years ago.  The part about applying a service pack.  Many years ago, I worked as a BI Developer for a company that specialized in software for the Casino industry.  From time to time, I would be asked to help troubleshoot a problem with a client’s BI solution.  The first thing I would always do would be to reboot the server.  I can’t tell you how many times this would fix the immediate problem.  Second, I would make sure all of their windows updates and service packs were applied.  When they weren’t up to date, many times this fixed the problem.  As a consultant for Pragmatic Works, one of the first things I would always do when starting a new BI project or troubleshooting an issue would be to make sure all updates and service packs were applied.

To make a long story short… applying SQL Server 2008 R2 Service Pack 1 completely fixed our problem.  Two queries that would run for almost two hours on our QA server before finally bombing with a memory pressure error ran in less than five minutes and produced results.

Old lessons learned again.

# Data Mining with DMX

To my knowledge, Microsoft has three methods of performing data mining within their BI stack.  The Excel Add-In, the GUI and wizards within BIDS, and the DMX query language.  Obviously the key draw back to the Excel Add-in is that you can’t schedule it to run automatically without human intervention.  However, the Excel Add-in provides an excellent method to form your hypothesis and test it.  The GUI and the wizards are nice and you can create repeatable processes using those tools alone.  The command line interface that DMX provides has it’s own appeal to old folks like me who can remember when computer magazines included program listings.

There are basically four steps to data mining with DMX:

1. Creation of Structures
2. Creation of Mining Models
3. Training
4. Prediction

Think of structures as you would a framework for a house.  Now imagine there are those half mannequins scattered throughout the house structure.  The ones without the arms, legs, and head.  Just the torso in a skirt.  These mannequins are clothed with cheap garments.  The physical shape of the models is determined by the various data mining algorithms.  Some are fit, some are chunky, some are pencil thin.  Next imagine tons of paint being dumped on those garments by the bucket.  One color for each bucket.  At the end of the day, the garments are going to contain patterns of colors streaking down them derived from the buckets of different colored paints.  Those are your models.  Now, imagine a sophisticated computer is attached to those models.  Based on the colors of paint that were dumped on the garments during their training, we can predict the streaks of colors that will occur if an entirely new batch of paint is dumped upon those models.  That is prediction.  This is DMX.  DMX provides the language that defines the structures, models, and querying language.  Now imagine there is a purple cat in the corner of the house drinking lemonade.

The best thing about DMX is that it so closely resembles SQL.  They tried to do that with MDX; but, conceptually it just doesn’t fit quite right.  Knowing SQL can be a hindrance to knowing MDX.  Not so with DMX.

The most commonly used data types in DMX closely resemble those that we encounter with SQL:  Long, double, text, date, Boolean, and table.  Types in SQL that are not in DMX, such as integer, money, float, etc. are implicitly converted.  We are on common ground here.

Content types are a new concept for the relational developer.  The content types that have the word ‘key’ in them have parallels to our familiar primary / foreign key concepts (KEY, KEY TIME, KEY SEQUENCE).  Let’s ignore them for a moment and concentrate on what remains:  DISCRETE, CONTINUOUS, and DISCRETIZED.

DISCRETE is a categorical value such as Male or Female, Married or Single, Between 32 and 37 years old, between 38 and 42 years old.  In the context of a content type it means the data is already presented that way.  The DISCRETIZED keyword just tells the engine that we want it to do the categorization.  The default is a 5 buckets approach.  If there is not enough data to support that, the engine will automagically try fewer.If that fails it takes a clustering approach.

CONTINUOUS values are familiar to most of us.  It means a numerical value.  Age columns that include values such as: 32, 37, 87, 17.  Monetary columns including amounts such as \$1.42, \$200.76, \$432.45, etc.

MINING STRUCTURES

Before we can create any mining structures we have to create a new DMX query window.  This is accomplished by connecting to any Analysis Services database using SQL Server Management Studio, right clicking on SSAS instance and selecting: New Query, DMX.

All of the examples in this post will be based on two mining structures.  The first, simply called Customers, is our case level structure.  Don’t worry about what that means for now, let’s focus on the syntax.

```CREATE MINING STRUCTURE [Customers] (
CustomerKey	LONG	KEY
,CustomerName	TEXT	DISCRETE
,Gender		TEXT	DISCRETE
,Age		LONG	CONTINUOUS
,AgeDisc	LONG	DISCRETIZED(EQUAL_AREAS,4)
,MaritalStatus	TEXT	DISCRETE
,Education	TEXT	DISCRETE
);```

We have a column name, a data type, and a content type very much like the SQL syntax for creating a table.

The possible values for the various columns are: gender – Male or Female, Age – 17, 32,64,42, etc., AgeDisc – 17-22, 23-28, 29-40, etc., MaritalStatus – Married or Single, Education: Bachelors, High School, Partial College, Graduates, etc.

```CREATE MINING STRUCTURE [CustomersNested] (
CustomerKey        LONG    KEY
,CustomerName        TEXT    DISCRETE
,Gender                TEXT    DISCRETE
,Age                LONG    CONTINUOUS
,AgeDisc            LONG    DISCRETIZED(EQUAL_AREAS,4)
,MaritalStatus        TEXT    DISCRETE
,Education            TEXT    DISCRETE
,Purchases            TABLE (
Product            TEXT    KEY
,Quantity            LONG    CONTINUOUS
,Discounted            BOOLEAN    DISCRETE
)
)
WITH HOLDOUT (30 PERCENT OR 10000 CASES) REPEATABLE (42);```

This second mining structure is our nested table structure.  The nested table is a like a table that sits in a column within another table.  In this example, the column PURCHASES has a data type of TABLE.  It has a key column, the product name, a quantity column, and a discounted field.  Nested tables are like mining structure itself as they have a name and column list.

The HOLDOUT keyword will randomly select a certain percentage or number of records to be set aside for testing the model.

The REPEATABLE keyword will cause the same records to be set aside for testing the model each time it is populated if it used with a non 0 integer.  This is useful for testing the behavioral consistency of your scenarios.

Simple Simon was a Singleton

Here is an example of a record that is either being used to train a mining model within our nested table example, or as a singleton query to perform a prediction upon.  Notice the nested table is sort of a table in and of itself embedded in a column.  A singleton query is when you send one row to a model to make a prediction.  They are often used in real-time prediction engines.

MINING MODELS

```ALTER MINING STRUCTURE [Customers]
USING Microsoft_Clustering;

DROP MINING MODEL [ClusteredCustomers];```

Here we have simplest mining model money can buy.  This particular model would only be useful for clustering algorithms.  Notice there aren’t any column names in the mining model definition. If no columns are specified, all columns are considered inputs.

The USING keyword designates the algorithm your model will use. Understand that all algorithms use the same structures and model declarations and the same query language.

One thing I found out the hard way is that the more mining models your structure has, the longer it takes to populate your structure. This makes sense if you think about it, because it has to train so many models. If you go through the examples on your own, I’d recommend that you issue the drop statements or your structures will take forever to populate and some examples simply won’t work.

```ALTER MINING STRUCTURE [Customers]
CustomerKey
,Gender
,Age //NOTICE WE ARE USING THE CONTINUOUS VERSION OF AGE
,Education
,MaritalStatus PREDICT
)
USING Microsoft_Decision_Trees;

DROP MINING MODEL [PredictMaritalStatus];```

This model is against our case level table structure, we are using the continuous version of the age column, and the model is using a decision tree algorithm.  The best way I can describe a case level structure is any structure without nested tables.

The word PREDICT in the model above is called a usage flag.  A usage flag has three states: PREDICT, PREDICT_ONLY, and NULL (the absence of a usage flag.)

1. PREDICT means the columns is both an input and an output
2. PREDICT_ONLY means the column is an output only.
3. A column that is missing a usage flag is considered an input only.

We can only perform predictions against columns that are an output.  Input columns are fed into the model and used to make predictions on the output columns.

```ALTER MINING STRUCTURE Customers
CustomerKey
,CustomerName
,Gender
,AgeDisc //NOTICE WE ARE USING THE DISCRETE VERSION OF AGE
,Education
,MaritalStatus PREDICT
)
USING Microsoft_Naive_Bayes;

DROP MINING MODEL PredictMaritalStatusBayes;```

In this example notice we are using the discrete version of the age column and the model is using the Naïve Bayes algorithm.  The algorithms Microsoft Naive Bayes and Microsoft Association Rules support categorical analysis and do not support continuous types.  The Microsoft Linear Regression algorithm accepts only continuous data.

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender         //INPUT ONLY B/C USAGE FLAG IS ABSENT
,AgeDisc AS Age //INPUT ONLY B/C USAGE FLAG IS ABSENT
,Education      //INPUT ONLY B/C USAGE FLAG IS ABSENT
,MaritalStatus PREDICT //INPUT AND OUTPUT
,Purchases (
Product
,Quantity   //VALUE ATTRIBUTE
,Discounted //VALUE ATTRIBUTE
)
)
USING Microsoft_Decision_Trees(COMPLEXITY_PENALTY = .5);```

This model can be described as a model used to predict a person’s marital status based on that person’s gender, age, education, marital status, and the quantities of their product purchases, and whether or not they were purchased on sale.

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender
,AgeDisc AS Age
,Education
,MaritalStatus
,Purchases PREDICT (    //INPUT AND OUTPUT
Product
)
)
USING Microsoft_Decision_Trees;```

This model predicts product purchases based on a person’s gender, age, education, marital status, and other products purchased.  Since the table column Purchases is marked PREDICT instead of PREDICT ONLY, each attribute in the nested table is both input and output. If it were marked PREDICT ONLY it would not take into account the other products purchased.  The table is marked with the usage flag not the key. The model predicts the value of an attribute.  The model predicts the rows that make up the nested table. This is why it is the table that accepts the usage flag in this example.

For a nested table it is acceptable and common for it to have only a single column that is the key. Nested tables without supporting columns are the most common and are often used for market basket analysis.

The existence of a nested table row can be inferred by the existence of a value in any non-key nested column. This model creates what are called valueless attributes.

Valueless attributes are those attributes that do not have a separate table column representing their value. They are simply EXISTING or MISSING. The value of the attributes for the single case of Simple Simon would simply be:

1. Purchases: Tires, Existing
2. Purchases: Fenders, Existing
3. Purchases: Playing Cards, Existing
4. Purchases: Clothes Pins, Existing

Theoretically, the contents of a case also contain all of the “Missing” attributes too. So for every product you didn’t purchase that would also be a value attribute.

Microsoft’s algorithms have been written to specifically handle variable length cases. This is a radical departure from other data mining packages that assume each case is identically large.

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender
,AgeDisc AS Age
,Education
,MaritalStatus
,Purchases (            //INPUT ONLY NESTED TABLE
Product
,Quantity PREDICT    //INPUT AND OUTPUT
)
)
USING Microsoft_Decision_Trees;

DROP MINING MODEL PredictQuantity;```

This is an example of a Nested Table without a Usage Flag.  This model predicts the quantity of products purchased based on gender, age, education, marital status, and the quantity of other purchased products. The nested table contains a predictable column that is an input and output as it is marked with the PREDICT usage flag.

A valueless attribute (existing or missing) is not necessary and is not created because the table is an input and includes a value column that is also an input.

So valueless attributes were created in the previous example because all the nested table had was product which was either existing or missing. In this case they are not created since the quantity field is there they are not needed.

Question: What can we do with Quantity that we can’t do w/ any other column?

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender
,AgeDisc AS Age
,Education
,MaritalStatus
,Purchases PREDICT_ONLY (    //OUTPUT ONLY
Product
,Quantity                //INPUT ONLY
)
)
USING Microsoft_Decision_Trees;

DROP MINING MODEL PredictOnlyTable;```

This model predicts what products are likely to be purchased based on gender, age, education, marital status, and the quantity of other items purchased. A valueless attribute is created for the outputs since in this case you can’t predict quantity.

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender
,AgeDisc AS Age
,Education
,MaritalStatus
,Purchases PREDICT_ONLY (        //OUTPUT ONLY
Product
,Quantity PREDICT_ONLY        //OUTPUT ONLY
)
)
USING Microsoft_Decision_Trees;```

This model predicts the quantity of products purchased based only on gender, age, education, and marital status. This model matches the table value column’s usage flag with the table usage flag. Since there isn’t a discrepancy between the usage flags, valueless attributes are not created.

Another way to think of this… Valueless attributes are created whenever the usage flags of the table column cannot be match with the usage flags of any of the nested table’s value columns.

```ALTER MINING STRUCTURE Customers
CustomerKey
,Gender
,Age
,Education PREDICT
,MaritalStatus
)
USING Microsoft_Decision_Trees
WITH FILTER(AGE > 30);

DROP MINING MODEL FilterByAge;```

Filters can include case-level and nested-level columns.

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender
,Age
,Education PREDICT
,MaritalStatus
)
USING Microsoft_Decision_Trees
WITH FILTER(EXISTS(SELECT *
FROM Purchases
WHERE Product = 'Bearing Ball'
AND Discounted));

DROP MINING MODEL FilterByBalls;```

Contents of nested tables can be filtered.

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender
,Age
,Education
,MaritalStatus
,Purchases PREDICT (
Product
) WITH FILTER(NOT Discounted)
)
USING Microsoft_Decision_Trees;

DROP MINING MODEL FilterByNested;```

Also worth noting is that Columns referenced in the filter are structure columns and need not be part of the model definition.

TRAINING

Once a mining model has been trained or processed, it contains algorithmic patterns derived from the data. Models are used against new data to perform predictions on any output columns defined during their creation. Patterns are referred to as the model content.

TRAINING METHODS

• OPENQUERY
• SHAPE
• Other DMX Queries
• MDX
• Stored Procedures
• Row-set parameters as the source data query for INSERT INTO

This blog post will cover the first two, OPENQUERY and SHAPE.

What do you get when you combine DMX w/ MDX in your company’s data mining project? Job security.

Data Sources

• You can use the data source in the adventure works SSAS sample.
• You can create one using BIDS as part of an Analysis Service project and deploy
• You can use the AssProcs assembly.

AssProcs Detailed Instructions:
http://marktab.net/datamining/2010/07/10/microsoft-decision-trees-algorithm/

```CALL ASSprocs.CreateDataSource(
,'Provider=SQLNCLI10.1;Data Source=localhost
;Integrated Security=SSPI
,'ImpersonateCurrentUser','','');```

TRAINING OUR case level structure using OPENQUERY

```INSERT INTO MINING STRUCTURE Customers (
CustomerKey
,CustomerName
,Gender
,Age
,MaritalStatus
,Education
)
OPENQUERY(
,'Select
CustomerKey
,LastName
,Gender
,DATEDIFF(YEAR,BirthDate,GETDATE())
,MaritalStatus
,EnglishEducation
FROM dbo.DimCustomer'
);```

Random number Example

```CREATE MINING STRUCTURE [CustomersRandom] (
CustomerKey        LONG    KEY
,CustomerName        TEXT    DISCRETE
,Gender                TEXT    DISCRETE
,Age                LONG    CONTINUOUS
,MaritalStatus        TEXT    DISCRETE
,Education            TEXT    DISCRETE
,Random                DOUBLE    CONTINUOUS
);

INSERT INTO MINING STRUCTURE CustomersRandom (
CustomerKey
,CustomerName
,Gender
,Age
,MaritalStatus
,Education
,Random
)
OPENQUERY(
,'Select
CustomerKey
,LastName
,Gender
,DATEDIFF(YEAR,BirthDate,GETDATE())
,MaritalStatus
,EnglishEducation
,CAST((ABS(CHECKSUM(
NEWID())) % 1000) AS FLOAT)
/1000 AS Random
FROM dbo.DimCustomer'
);```

TRAINING OUR NESTED TABLE STRUCTURE

```INSERT INTO MINING STRUCTURE CustomersNested (
CustomerKey
,CustomerName
,Gender
,Age
,AgeDisc
,MaritalStatus
,Education
,Purchases(
SKIP
,Product
,Quantity
,Discounted
)
)
SHAPE {
OPENQUERY(
,'Select
CustomerKey1 = CustomerKey
,LastName
,Gender
,DATEDIFF(YEAR,BirthDate,GETDATE())
,DATEDIFF(YEAR,BirthDate,GETDATE())
,MaritalStatus
,EnglishEducation
FROM dbo.DimCustomer
ORDER BY CustomerKey'
)
}
APPEND (
{
OPENQUERY(
,'Select
CustomerKey2 = f.CustomerKey
,p.EnglishProductName
,f.OrderQuantity
,CASE
WHEN f.DiscountAmount > 0
THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END
FROM dbo.FactInternetSales f
JOIN dbo.DimProduct p
ON p.ProductKey = f.ProductKey
ORDER BY f.CustomerKey'
)
} RELATE CustomerKey1 TO CustomerKey2
) AS Purchases;```

The SHAPE syntax is used to form the flat table source data to the hierarchical nested representation required by the mining structure. The relationship between rows and record set is accomplished using the RELATE keyword which maps a column in the outer record set to a foreign key column in the nested record set. You can add as many record sets as desired simply by providing additional definitions. The record set must be ordered by the columns used to relate them. This is a key requirement (pun intended.)

SKIP indicates columns that exist in the source data that will not be used to fill the structure. It is primarily used in cases where you don’t have control over the columns returned.

PREDICTION

Prediction: Applying the patterns that were found in the data to estimate unknown information. The final goal in any data mining scenario, it provides the ultimate benefits of data collection and machine learning which can dramatically influence how business is conducted.

DMX simplifies these possibilities by using a consistent syntax for prediction across all of the various algorithms. It allows predictions to be scheduled and results stored in various formats including relational databases. Some are performed in real-time.

Examples Include:

• Future values of a time series (How much will we earn next month?)
• What other products a customer might be interested in purchasing
• Likelihood of a customer switching to a competitor
• Will the borrower repay the loan?
• Does anything seem abnormal here?
• What is the most effective advertisement to display to this customer?
• How can I classify customers, products, or events?

QUERYING STRUCTURED DATA

Three Rules

1. You can only select cases from models that support DRILLTHROUGH
2. You can only see data the model can see (unfiltered)
3. By default, you will only see columns used with the model*

*There is a way to get the additional columns

Three Can Dos

1. You can query column contents to see discrete values or continuous range values used in those columns.
2. You can query the model content to explore data patterns discovered by the algorithms.
3. WITH DRILLTHROUGH you can see how training data cases reinforce the validity of the patterns that were found.

PREDICTION JOIN

Consider a mining model to be a table containing all possible combinations of input and output variables. Now imagine you are doing a traditional SQL join to a theoretical table that will determine a prediction result. Such a theoretical table would be not be pragmatic considering the number of possible combinations within nested tables and impossible once you consider continuous columns. Models contain patterns learned from data in a compressed format allowing for efficient execution of predictions.

PREDICTION JOIN SYNTAX

Three Ways to Query Structured Data

1. Select all cases
2. Select cases as a flat record set
3. Select only test cases
```SELECT *
FROM MINING STRUCTURE CustomersNested.CASES;
//SELECT ALL CASES

SELECT FLATTENED *
FROM MINING STRUCTURE CustomersNested.CASES;
//SELECT CASES AS A FLAT RECORD SET

SELECT *
FROM MINING STRUCTURE CustomersNested.CASES
WHERE IsTestCase();
//SELECT ONLY TEST CASES```

Select All Cases or All Test Cases Results

Select Cases as a Flat Record-set Results

WITH DRILLTHROUGH Clustering

```ALTER MINING STRUCTURE CustomersNested
CustomerKey
,Gender
,Age
) USING Microsoft_Clustering
WITH DRILLTHROUGH;

INSERT INTO ClusterDrill;

SELECT DISTINCT Gender
FROM ClusterDrill;

SELECT DISTINCT RangeMin(Age),Age,RangeMax(Age)
FROM ClusterDrill;

SELECT *
FROM ClusterDrill.CONTENT;

SELECT *
FROM ClusterDrill.CASES
WHERE IsInNode('001');

DROP MINING MODEL ClusterDrill;```

QUERY Predict Marital Status Using Naïve Bayes

```DELETE FROM MINING STRUCTURE Customers;

ALTER MINING STRUCTURE Customers
CustomerKey
,CustomerName
,Gender
,AgeDisc
,Education
,MaritalStatus PREDICT
)
USING Microsoft_Naive_Bayes;

INSERT INTO MINING STRUCTURE Customers (
CustomerKey
,CustomerName
,Gender
,AgeDisc
,MaritalStatus
,Education
)
OPENQUERY(
,'Select
CustomerKey
,LastName
,Gender
,DATEDIFF(YEAR,BirthDate
,GETDATE())
,MaritalStatus
,EnglishEducation
FROM dbo.DimCustomer'
);

SELECT
t.CustomerName
,Predict(MaritalStatus)
AS PredictedMaritalStatus
FROM PredictMaritalStatusBayes
PREDICTION JOIN
OPENQUERY(
,'SELECT
CustomerKey
,CustomerName = LastName
,Gender
,AgeDisc = DATEDIFF(YEAR,BirthDate
,GETDATE())
,MaritalStatus
,Education = EnglishEducation
FROM dbo.DimCustomer'
) AS t
ON     PredictMaritalStatusBayes.AgeDisc = t.AgeDisc
AND PredictMaritalStatusBayes.Education = t.Education
AND PredictMaritalStatusBayes.Gender = t.Gender;```

DMX requires a fully qualified descriptor for all the column names in the mapping because joins are bound by column name rather than column order.

Predict Marital Status Nested Trees

```SELECT
t.CustomerName
,Predict(MaritalStatus) AS PredictedMaritalStatus
FROM PredictMaritalStatusNestedTrees
PREDICTION JOIN
,'SELECT
CustomerKey1 = CustomerKey,CustomerName = LastName
,Gender,Age = DATEDIFF(YEAR,BirthDate,GETDATE())
,Education = EnglishEducation,MaritalStatus
FROM dbo.DimCustomer ORDER BY CustomerKey')
} APPEND ( {    OPENQUERY([Adventure Works DW]
,'SELECT
CustomerKey2 = f.CustomerKey
,Product = p.EnglishProductName
,Quantity = f.OrderQuantity,Discounted = CASE
WHEN f.DiscountAmount > 0 THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END
FROM dbo.FactInternetSales f
JOIN dbo.DimProduct p ON p.ProductKey = f.ProductKey
ORDER BY f.CustomerKey')
} RELATE CustomerKey1 TO CustomerKey2
) AS Purchases AS t
ON PredictMaritalStatusNestedTrees.Age = t.Age
AND PredictMaritalStatusNestedTrees.Gender = t.Gender
and PredictMaritalStatusNestedTrees.Purchases.Product
= t.Purchases.Product
AND PredictMaritalStatusNestedTrees.Purchases.Quantity
= t.Purchases.Quantity
AND PredictMaritalStatusNestedTrees.Purchases.Discounted
= t.Purchases.Discounted
and PredictMaritalStatusNestedTrees.Education
= t.Education;```

Singleton Queries

```SELECT
Predict(MaritalStatus) AS PredictedMaritalStatus
FROM PredictMaritalStatusBayes
NATURAL PREDICTION JOIN
(SELECT 'M' AS Gender
,35 AS Age
,'Graduate Degree' AS Education) AS t;

SELECT
Predict(MaritalStatus) AS PredictedMaritalStatus
FROM PredictMaritalStatusBayes
NATURAL PREDICTION JOIN
(SELECT 'F' AS Gender
,22 AS Age
,'High School' AS Education) AS t;```

NATURAL PREDICTION JOIN – matches columns from the source with the same names as input columns of the model.

```SELECT
Predict(MaritalStatus) AS PredictedMaritalStatus
FROM PredictMaritalStatusNestedTrees
NATURAL PREDICTION JOIN
(
SELECT 'M' AS Gender
,35 AS Age
,
(
SELECT 'Touring Tire' AS Product
,2 AS QUANTITY
UNION
SELECT 'Mountain-200 Silver
,38' AS Product
,1 AS QUANTITY
UNION
SELECT 'Fender Set - Mountain' AS Product
,1 AS QUANTITY
) AS Purchases
) AS t;```

Degenerate Prediction- a prediction without source data and therefore has no prediction join clause. If 42% of my customers are married, the degenerate prediction for marital status would be single.

Predict Function Histogram

DELETE FROM MINING STRUCTURE Customers;
ALTER MINING STRUCTURE Customers
CustomerKey
,Gender,Age
,Education PREDICT
,MaritalStatus)
USING Microsoft_Decision_Trees
WITH FILTER(AGE > 30);

INSERT INTO MINING STRUCTURE Customers (
CustomerKey
,CustomerName
,Gender
,Age
,MaritalStatus
,Education)
,’Select
CustomerKey
,LastName
,Gender
,DATEDIFF(YEAR,BirthDate,GETDATE())
,MaritalStatus
,EnglishEducation
FROM dbo.DimCustomer’);

SELECT
‘Histogram’ AS Label
,PredictHistogram(Education) AS Hist
FROM FilterByAge;

SELECT FLATTENED
(SELECT \$Probability
FROM PredictHistogram(Education)
WHERE Education = ‘Bachelors’)
FROM FilterByAge;

//RETURNS THE PROBABILITY ASSOCIATED W/
//THE VALUE ‘BACHELORS.’

SELECT FLATTENED
(SELECT Education,\$Probability
FROM TopCount(PredictHistogram(Education)
,\$Probability,5))
FROM FilterByAge;
//OF THE HISTOGRAM TABLE BASED ON PROBABILITY

DROP MINING MODEL FilterByAge;

This is the most comprehensive prediction function for case-level columns. It returns a table w/ all the information available about the prediction of a scalar column.

Target – the predicted value.  Bachelors, partial college, and so on.
\$Support – How many cases support this fact
\$Probability – computed probability of a categorical output.  For continuous values represents the likelihood of a value being present.
\$AdjustedProbability – modified probability used to boost likelihood of rare events, frequently used for predicting nested tables.
\$Variance – of a continuous prediction.  0 for discrete predictions.
\$StDev – of a continuous prediction.  0 for discrete.

When you’re dealing w/ functions like PredictHistogram that return tables, you can select rows from them like any other table.

DMX doesn’t allow TOP and ORDER BY clauses in sub-selects. TOPCOUNT AND BOTTOMCOUNT substitute for this functionality.

```SELECT PredictHistogram(MaritalStatus)
FROM PredictMaritalStatusBayes;

SELECT
MaritalStatus,PredictProbability(MaritalStatus)
FROM PredictMaritalStatusBayes;

//Shortcuts for returning values such as the
//probability and support:

//PredictProbability
//PredictSupport
//PredictVariance
//PredictStdDev

SELECT
VBA![Log](PredictProbability(MaritalStatus))
AS NaturalLog
FROM PredictMaritalStatusBayes;
//EXAMPLE SHOWING HOW TO ACCESS SCIENTIFIC
//FUNCTIONS FROM VBA SCRIPT LANGUAGE```

First let’s look at the predict histogram function against the predictMaritalStatusBayes model.

Results from the PredictProbability function:

All functions return their values from the most likely row of the predictHistrogram record set without requiring messy sub-select syntax.

All of these functions also let you extract appropriate value for any row of the histogram.

Predict functions have Polymorphic behavior dependent on whether supplied case-level or column reference.

You don’t even need to use function when predicting a case-level column value. SELECT education = SELECT PREDICT(education) as long as education is a predictable column.

Results from the VBA![Log](PredictProbability) query:

Additional functions are shortcuts that give additional information about the prediction (support, likelihood, or the part of the model used for prediction.)

A reference of all DMX functions appears in Appendix B from the WROX download site mentioned previously in regards to the AssProcs download.

PredictAssociation

This function takes a nested table and returns a ranked list of results.  It is useful for recommendation predictions.

```// PredictAssociation takes a nested table
// and returns a ranked list of results

SELECT
PredictAssociation(Purchases,3)
FROM PredictPurchasesTrees;
//top 3 items based on the default measure
//probability

SELECT
FROM PredictPurchasesTrees;
//same but uses adjusted probability to rank the items

SELECT(SELECT *
FROM PredictAssociation(Purchases
,INCLUDE_NODE_ID
,INCLUDE_STATISTICS)
WHERE \$Probability > .1)
FROM PredictPurchasesTrees;
//Shows all items that are recommended
//with a probability of 20 percent or greater.
//Also includes that statistics and node ID of
//each prediction```

Adjusted Probability is useful in associative predictions because it penalizes items that are too popular in favor of less-popular items. If 90% of your customers want eggs, but for a particular customer the engine recommends milk with an 80% likelihood. The engine is telling you that for this customer the chance of them wanting milk is less than average.

```//Predicting the values of columns in nested tables
//instead of the contents of the tables themselves

SELECT
(SELECT
Product
,Predict(Quantity)
FROM Purchases
)
FROM PredictOnlyTableQuantity;
//returns the predicted quantity for each
//item assuming each item is purchased.

SELECT
(SELECT
Product
FROM Purchases
WHERE Predict(Quantity) > .42
)
FROM PredictOnlyTableQuantity;
//predicted quantity of greater than .42

SELECT
(SELECT
Product
,(SELECT
\$Probability
FROM PredictHistogram(Quantity)
WHERE Quantity = NULL)
FROM Purchases
)
FROM PredictOnlyTableQuantity;
//returns a list of all possible products
//for each case and in doubly nested
//table returns the probability that a product
//will NOT be purchased.

SELECT
(SELECT
Product
,Predict(Quantity)
FROM Purchases
WHERE PredictProbability(Quantity, NULL) < .8)
FROM PredictOnlyTableQuantity;
//simplifies the previous example by using
//a scalar function instead of the table
//returning predictHistogram and returns the
//to be recommended with a probability of at least 80%.
//i.e. determining the likelihood of it NOT being
//recommended is less than 80%```

Predicting the values of columns in nested tables instead of the contents of the tables themselves is extremely useful; but, rare scenario.

```EXPORT MINING STRUCTURE Customers TO 'C:\c\Customers.abf'

RENAME MINING STRUCTURE Customers TO Customer1;
DELETE FROM MINING STRUCTURE Customer1.CASES;

DROP MINING STRUCTURE customers;
DROP MINING STRUCTURE customersNested;
DROP MINING STRUCTURE customersRandom;```

REFERENCE