My team and I recently had an issue with our SSAS instance in which users were receiving the following message from Excel:
Data Could not be retrieved from the external data source. Error message returned by the external data source:
Server: The operation has been cancelled due to memory pressure.
Another user reporting a different error from Excel when trying to create a new connection to the Analysis Services database:
An error has occurred in the transport layer.
Errors in the OLE DB provider. An error occurred while loading the connection dialog box components for prompting.
The following blog posts helped us identify and kill the MDX query and the user was executed:
The problem seemed to be caused by poorly constructed Excel pivot tables. Users can do some strange things. Things like dragging product numbers to the columns and product descriptions to the rows. Regardless, a single user shouldn’t be able to bring down an entire server making it useless to anyone else in the company, right?
For the short term we had to find a way to automatically kill queries like these automatically. Otherwise, our team could never go to lunch together. Someone would always have to stay behind to log the locked query and restart the SSAS service if the server spun out of control.
Luckily, we were able to reproduce the condition by simply executing one of the queries we had captured on our QA server. We tried a SSIS package that Chris Webb wrote to automatically kill locked sessions. Unfortunately, when the issue occurred, the SSIS package couldn’t get a connection to check for locks let alone kill them. I also played with the Server Timeout advanced property of the SSAS instance; but, that would only work reliably if we set it to a very small time period, such as 30 seconds. Otherwise, it would blow right past the time out and never get cancelled. I also got the PowerShell version of Chris Webb’s method to work; but, again it would only work reliably for short time periods.
*Make sure the user executing the Agent job is a SERVER administrator to get the PowerShell solution to work. Only a WINDOWS SERVER admin can issue cancel XMLA requests.
After more research and testing, I found this snippet:
Check the CancelCoordinatorCount property, you can decrease a bit from its default value, which will force the engine to check the IsCancelled property more often. As a result, the queries will be stopped faster. We had big problems with that, but after some post SP2 updated + with using that CancelCoordinatorCount thighs got much better.
I changed the CancelCoordinatorCount property from 1000 to 100 and the PowerShell solution would work with a slightly larger threshold. Coupled w/ a larger failsafe threshold on the Server Timeout property (which also started working) I was getting closer to what I wanted.
Something about that post triggered a memory and a promise to myself I had made years ago. The part about applying a service pack. Many years ago, I worked as a BI Developer for a company that specialized in software for the Casino industry. From time to time, I would be asked to help troubleshoot a problem with a client’s BI solution. The first thing I would always do would be to reboot the server. I can’t tell you how many times this would fix the immediate problem. Second, I would make sure all of their windows updates and service packs were applied. When they weren’t up to date, many times this fixed the problem. As a consultant for Pragmatic Works, one of the first things I would always do when starting a new BI project or troubleshooting an issue would be to make sure all updates and service packs were applied.
To make a long story short… applying SQL Server 2008 R2 Service Pack 1 completely fixed our problem. Two queries that would run for almost two hours on our QA server before finally bombing with a memory pressure error ran in less than five minutes and produced results.
Old lessons learned again.