If I am tasked with troubleshooting relational report performance there are a number of processes I will follow to identify exactly where the bottlenecks exist so I can focus my performance tuning on the areas that will have the most impact. I’d like to share these processes with the community and invite others to contribute their own tips in the comments section. I’ll try to update this blog post with best suggestions as they come available. This post will focus on relational reports as opposed to OLAP reports.
Before I begin, please review the following MSDN post: Troubleshooting Reports: Report Performance. I am going to try and not repeat anything listed in that article because it is a great resource and should be your first stop.
The first suggestion I have is to use stored procedures instead of inline SQL. I think it makes sense from a modular programming point of view; but, there are also some performance gains that can be made from this practice. A great discussion on the pros/cons of this approach can be found in this StackOverFlow forum post: Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS’s?
So if you have a report that uses a huge stored procedure the next step is to get a baseline measurement for how long the stored procedure takes to execute using certain parameters. Using the same parameters, run the report and measure the length of time it takes the report to render. The difference between the stored procedure execution time and the total report execution time will give you a general idea of how long it takes the Reporting Services engine to render the data. This approach assumes you do not have DBA privileges to actually query the executionlog3 view on the reporting services instance to get exact values yourself.
If a report is taking thirty minutes to run and the stored procedure is taking fifteen minutes of that time then you need to look at what you can do to improve data retrieval and what you can do to improve report rendering. Count yourself lucky if can identify one or the other as being the bottleneck. You’ll have half as much work to do!
Troubleshooting Stored Procedure Data Retrieval
I like to start by creating a table on the development server to measure the time it takes the SQL engine to perform each of the steps within the stored procedure. For example:
CREATE TABLE WORK.PerfTuning ( TestNumber INT NOT NULL ,TestSystem VARCHAR(50) NULL ,StartProcedure DATETIME NULL ,EndProcedure DATETIME NULL ,ProcedureDuration DATETIME NULL ,Milestone1 DATETIME NULL ,Milestone2 DATETIME NULL ,Milestone3 DATETIME NULL ,Milestone4 DATETIME NULL ,Milestone5 DATETIME NULL ,Milestone6 DATETIME NULL ,Milestone7 DATETIME NULL ,Milestone8 DATETIME NULL ,Milestone9 DATETIME NULL ,Milestone10 DATETIME NULL ,Counter1 INT NULL ,Counter2 INT NULL ,Counter3 INT NULL PRIMARY KEY CLUSTERED (TestNumber ASC) WITH ( PAD_INDEX = OFF ,STATISTICS_NORECOMPUTE = OFF ,IGNORE_DUP_KEY = OFF ,ALLOW_ROW_LOCKS = ON ,ALLOW_PAGE_LOCKS = ON ) ON CIA_SecretDatabase ) ON CIA_SecretDatabase
Add the following code to the beginning of your stored procedure:
DECLARE @TestNumber int = 1; DELETE FROM WORK.PerfTuning WHERE TestNumber = @TestNumber; INSERT INTO WORK.PerfTuning (TestNumber ,TestSystem ,StartProcedure ) VALUES (@TestNumber ,'QA' ,GETDATE() );
Add Update statements after each major code block within your stored procedure incrementing the Milestone1 to Milestone2, Milestone3, and so on.
UPDATE WORK.Perftuning SET Milestone1 = GETDATE() WHERE Testnumber = @Testnumber;
After you execute the stored procedure the following query will tell you how many seconds it took for each milestone to complete. The milestones with the greatest values are the ones you should focus on.
SELECT testnumber ,duration = CAST(DATEDIFF(SECOND, startprocedure, endprocedure) AS VARCHAR(5)) ,Milestone1 = CAST(DATEDIFF(SECOND, startprocedure, Milestone1) AS VARCHAR(5)) ,Milestone2 = CAST(DATEDIFF(SECOND, Milestone1, Milestone2) AS VARCHAR(5)) ,Milestone3 = CAST(DATEDIFF(SECOND, Milestone2, Milestone3) AS VARCHAR(5)) ,Milestone4 = CAST(DATEDIFF(SECOND, Milestone3, Milestone4) AS VARCHAR(5)) ,Milestone5 = CAST(DATEDIFF(SECOND, Milestone4, Milestone5) AS VARCHAR(5)) ,Milestone6 = CAST(DATEDIFF(SECOND, Milestone5, Milestone6) AS VARCHAR(5)) ,Milestone7 = CAST(DATEDIFF(SECOND, Milestone6, MILESTONE7) AS VARCHAR(5)) ,Milestone8 = CAST(DATEDIFF(SECOND, Milestone7, EndProcedure) AS VARCHAR(5)) FROM WORK.Perftuning
Seems great, huh? As you go through the code for the heavy hitting milestones and make changes you should be able to see the impact of those changes on the execution time, right? Well, it depends. Hopefully you have the fortune of being able to work on a development server that no other developers are using. Usually, this is not the case. Second best would be to have the source databases on your local workstation. You can then issue a DBCC DROPCLEANBUFFERS command to test queries with a cold buffer cache without shutting down and restarting the server and without affecting other developers. (More info: SQL SERVER – Stored Procedure – Clean Cache and Clean Buffer)
So what if you can’t do either? My suggestion would be to run every test twice and take the average. It’s really the best we can do in such a situation. Unless you have DBA privileges you’ll never have any insight into what else was happening on the server during the time you performed your test. Another developer could very well be running a resource intensive query at the same time.
Once you have identified the bottlenecks in your stored procedure, analyze the execution plan of each code block for your worst offenders (the ones that take the longest time to complete.) It will suggest indexes that can be added to improve performance. There are many resources for improving queries out there and how to add indexes to support them. To be brief, make sure you have indexes to support your joins, where clause filters, case statements, and group bys. Also make sure your joins are in the same order of your index and wherever possible include the clustered index. Try to avoid queries within queries within queries, consider using common table expressions instead. Test different variations of #temp tables vs. table variables. Rule of thumb here is use #temp tables for lots of data and use table variables for small amounts of data. You can add indexes to your #temp tables where appropriate, you can’t with table variables.
Other things that can improve data retrieval performance: database compression introduced in SQL Server 2008 is an easy win to improve query performance. Solid state storage is another great option. Consider partitioning tables that are 50GB or larger.
Performance tuning is a time consuming task so I can’t emphasize this enough: Work from a subset of the data. If the stored procedure takes 15 minutes to run against a table with 5 years of data it should run much faster against a table with only 1 year of data. If it takes 5 minutes to run on 1 year of data and you improve it to run in only 1 minute, those performance gains will translate to the larger dataset as well; but, the development time will be much less.
Troubleshooting Report Rendering Bottlenecks
Once your stored procedure is running lean, mean, and fast it’s probably time to look at what we can do to improve the rendering of the report. The previously mentioned link from MSDN, Troubleshooting Report Performance, has a lot of great suggestions. Check my recent blog entry on the Interactive Height setting not being set to 0, that one is a doosey.
Whenever possible, let the SQL Engine do the work. Grouping and ordering is usually going to be more efficient if done in the stored procedure. Is the report trying to do too much in one report? This is a common problem. Consider breaking the report into several reports. Include links to the other reports within each report. It may not be as fancy as a document map; but, the report will render much more quickly which should improve the user experience. It may not. If the user always wants to see the other sections they may prefer to wait 30 minutes and have it all ready as opposed to waiting 10 minutes, clicking a link and waiting another 10 minutes. Drill down functionality can be very expensive during rendering. You can alternatively use linked “drillthrough” reports to support similar, albeit not as fancy, functionality.
Finally, ask yourself: are you using the right tool for the job? Relational reports are best for operational type reporting. The types of reports the middle managers need on a daily basis. Highly aggregated, birds-eye view reports should use a data warehouse or even better an OLAP cube. Reports that render over one hundred pages of data that are subsequently exported to Excel are inefficient. These are not reports, they are data dumps. Sure, scheduling those reports to run once a day can help alleviate some of this grief; but, why not just use an SSIS package to do this job?
Do you suspect that the report server itself is not configured correctly? Prove it! Temporarily modify your procedure to populate a physical table with the resultset. Then modify the procedure again to select only the top 1000 rows from that table. Make sure to use different parameters so you’re not hitting the cache and run the report in BIDs measuring the time, and then run the report using Internet Exploder measuring the time. If your workstation renders the same report in a lot less time than the server than you may have a case for the DBA to do some further investigation on the Reporting Services configuration. This is also a great way to eliminate data retrieval bottlenecks during report execution as you troubleshoot rendering performance and functionality issues.
That’s all I have for now. I hope to add more to this post as I think of more great tips or encounter them in the future. Please share any tips, links, or suggestions in the comments.