Here’s What’s In My BI Tool Belt
1. BI xPress – Auditing, Notification frameworks and much, much more! Number one for a reason…
2. Task Factory – Extra SSIS components. SCD Merge Transform, Upsert, SharePoint Source/Destination and more
3. BI Documenter – Takes the pain out of documentation. The easy button.
4. BIDS Helper (does some things BI xPress does not)
5. Parent Child Naturalizer (codeplex)
6. ResophNotes – code snippet organizer
7. Cube Player
8. Blitz! – SQL Server TakeOvers
9. PoorSQL SQL Code formatting
12. SMS Tools Pack
13. Open Modelsphere – open source modeling tool
15. Database Browser
17. Data Space Analyzer – Analyze space used by indexes visually
Data Surf – Use FK relationships to navigate from rows in one table to the next
Schema Surf – Navigate object tree for visualization of object dependencies
18. Data Inspector – SQL diff & database sync tool
19. Schema Inspector – Synchronization and script tool
21. ManicTime Time Tracking
Adds some nice features for working with SSAS in Excel.
This Works for SQL Server 2008 too. There is another download if you are still using Office 2007.
Complements the data mining add-in nicely. Easily bring external publicly available data into your forecasts.
Windows Live Writer (client)
WordPress (self-hosted or WordPress.com, although self-hosted gives you MUCH more flexibility in terms of plugins/themes/etc.)
Write or Die Desktop (Forces you to write constantly or it annoys you with sounds)
Compression: WinRAR, 7-Zip (alternate)
ZoomIt – Don’t have it yet and you’re a presenter? Go install it-right now
Beyond Compare (REALLY cool comparison tool. Quickly compare file/folder contents, awesome for quick editing and syncing file contents like XML files side-by-side)
RDCMan – Remote desktop manager, great for managing multiple RDP sessions. Can group RDC connections and stuff. Nice and FREE from Microsoft
Snagit (image capture/editor)
Camtasia (video capture/editing)
FeedDemon (RSS reader)
Virtual CloneDrive (virtual CD/DVD drive so you can mount ISOs)
Foxit (PDF reader, lighter alternative to Adobe Reader)
Defraggler (free defrag utility)
Dropbox (sync files/folders between PCs, mobile app also lets you get to your folders from phone which is nice too)
Windows Live Mesh (same as above, alternate)
Evernote (Note taking app, syncs across multiple platforms including mobile apps on all major platforms)
DemoMate (paid product but discovered it through MVP program. Create interactive/portable demos.)
PowerGUI (Great PowerShell IDE and helpful for learning PowerShell in general)
Microsoft Script Explorer (helpful beta program they released that gives you TONS of resources in helping you learn PowerShell including scripts, snippets, guides, etc.)
Pomodairo (Adobe AIR-based app that tracks your time using Pomodoro technique)
Klok (Adobe AIR-based app that lets you track time across multiple clients/projects)
VMware Workstation (desktop virtualization but not free, still, awesome)
——————————————————————————- Robert’s List
Here’s a few of my often used tools. Technically they aren’t BI only, but stuff I use while doing my BI job. Maybe others will find them useful.
If you are a keyboard fanatic, you’ll love TouchCursor. It adds the ability to turn your letter keys into special keys. You hold down the action key (by default it’s the spacebar, but you can reassign it) then, for example the letter I, and it becomes the up arrow (moves the cursor up). SPACE+J moves the cursor left, SPACE+L moves it right, etc. You can also add your own mappings, for example I have SPACE+E map to F5 so I can quickly execute scripts in SSMS.
TeraCopy is great if you copy big files around a lot, it has a lot of nice features. It detects when a file already exists, and gives you the option to only overwrite older files. A great option when you are backing up a lot of files but aren’t sure what has changes. It also copies much faster than the copy function built into Windows.
Security should be important to all of us, especially since we handle sensitive client data on an almost daily basis. To keep data safe I use TrueCrypt. You can use TrueCrypt to encrypt your entire hard drive, and / or create encrypted file vaults. Similar to a Zip container only without the compression. It has literally unbreakable encryption. There was a case a few years ago where the FBI was asked to break the TrueCrypt encryption of a Brazilian (suspected) crime boss. They tried for five years and finally gave up.
Speaking of ZIP, I use 7-Zip, very easy to use and adds right click capabilities to quickly and easily zip and unzip files. Free, and much less overhead then using WinZip.
For mounting ISOs I use Virtual Clone Drive. Easy to use and setup, you can have multiple drives mounted at once, each with it’s own drive letter. Easy to mount an ISO, just double click on it.
For text editing I use UltraEdit / UEStudio. Not a free program, but worth the money. Supports macros, customizable code highlighting, can open from / save to FTP, and can handle huge files. I opened a 2 GIG file before and it flew through it with no issues. I’ve been using it for as long as I can remember. If you are looking for a free easy to use editor check out Notepad2. I think it’s easier to use and configure than Notepad++.
If you are doing PowerShell, and if you’re not you will be, I use the now free PowerSE app. Easy to use, fully featured, has become my favorite PowerShell editor even over the paid ones.
For video I use VLC, because it can play anything. Literally, I’ve yet to run across something it can’t play.
Sometimes I need a simple, easy to use image editor. For those times I use the free Paint.Net. Not a lot of bells and whistles but for me that’s good, just a simple easy to use editor.
I like to be able to keep my browser bookmarks in sync across all browsers, across all of my computers. For that I use an app called XMarks. To me, it’s what an app should be. It just sits in the background and works. Never pops up with a lot of questions or gets in my way.
For a password manager I’ve found none better than LastPass. From a security standpoint it’s rock solid. Even better, your password is never stored in their system, but rather an encrypted hash of it. It runs as a browser add in for all browsers, or you can go to their website and login and get to your passwords that way. You can generate one time use passwords, so if you go to a place where you might have some risk, say an internet cafe where there’s a risk of a key logger, all the bad guys would get is your one time password. It’s free, although consider making a donation. They also own XMarks.
We all need to read PDFs, but Adobe is over bloated and a huge security hole. It used to be Flash was the number one vector for security breaches, but lately that has changed to become Adobe’s PDF reader. I much prefer the free FoxIT PDF Reader, lighter weight and more secure.
Now, if all of this seems a bit overwhelming, I can make it very easy on you to install. Just go to Ninite.com. It’s a very cool site. They have a list of a ton of freeware apps listed in various categories. You simply go down the list and check the ones you want, then click the Get Installer button. Ninite will then generate an app which you download and run. It will install all of the apps you picked, and will select all the defaults you would. No, I don’t want your stupid toolbar Yahoo/Bing/Google/whomever. No, I don’t want you to track me. Yes, I agree to the license agreement I wouldn’t have read anyway. Those kinds of things. This is a great app when setting up a new PC or VM, many of the apps I listed here can be installed via Ninite.
Special thanks to the following Pragmatic Works consultants for their input for this blog post:
Robert C. Cain, MVP, MCTS
Jorge Segarra, MVP, DBA/BI Consultant
Devin Knight Senior BI Consultant (MCTS, MCITP)
What’s in your tool belt? Got anything to contribute? Please sound off in the comments.
The new SSIS connectors for Oracle and Teradata are available now.
FYI the 2.0 version is for 2012 only… You want the 1.2 version for 2008
—Thanks to Mike Davis & Devin Knight for the info!
Using the Microsoft Connector for Oracle by Attunity with SQL Server 2008 Integration Services
I was recently tasked with creating a test package to check the performance of SSIS loading data from MS SQL Server 2008 to an Oracle database and loading data from Oracle to MS SQL Server.
My tests have only been performed on a 32bit laptop running Windows XP (Yeah, I know…)
Here are the steps I took:
- Install Oracle Developer Suite 10g (10.1.2.0.2) (including Forms and Designer) http://www.oracle.com/technetwork/developer-tools/developer-suite/downloads/index.html (for 64 bit test, you would just need to install the 64 bit version.)
- Edit C:\DevSuiteHome_1\NETWORK\ADMIN\TNSNAMES.ORA by appending the TNS information provided by your Oracle DBA.
- Edit C:\WINDOWS\system32\drivers\etc\hosts using notepad. Add the host entries provided by your Oracle DBA to the file.
- Create a new SSIS package, add a connection manager of type Native OLE DB\Microsoft OLE DB Provider for Oracle. Enter SOMETEXT (see example TNS Entry below) in Server name, and the user name and password provided. Click Test Connection.
- Create an Execute SQL Task to truncate the Oracle table you will be loading. Must be coded like this: truncate table “SOME_TEST”.”Movies” **NOTE: SOME_TEST is the schema name.
- Create another Execute SQL Task to truncate a SQL Server table called Movies2.
- Create a data flow to pull data from SQL Server to Oracle.
- Create a data flow to pull that data from Oracle into another new table in SQL Server (Movies2).
- Data Types are tricky. You may have to cast as nvarchar or numeric(38,0) when pulling from SQL Server and use derived column transformations when pulling from Oracle.
- For the Oracle Source & Destination components, right click and select Show Advanced Editor. Under component properties, set AlwaysUseDefaultCodePage to True.
These drivers only took 45 seconds to complete the same task. It creates new data flow components specifically for Oracle source and Oracle destination (similar to the OLE DB Source / Destination components; but, specifically for Oracle.)
ExSample TNS Entry:
(ADDRESS = (PROTOCOL = TCP)(HOST = SOMESERVERA)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = SOMESERVERB)(PORT = 1521))
(LOAD_BALANCE = yes)
(SERVER = DEDICATED)
(SERVICE_NAME = APPGA)
Example Hosts entry:
The following are the instructions my buddy Stephen sent me. I didn’t have to follow these b/c my test was only done on a 32 bit laptop. I think he was developing on a 64bit environment. I will post them here in the event they might help someone else.
If server is 64bit, Then we have to install both the
32bit and 64bit drivers for the attunity connector utility.
Can be found at:
Once installed, you’ll have to installed the correct Oracle Client tools.
This also requires you install the 32bit and 64bit versions.
next you have to setup the Service Names on the DB box through oracle
using the Oracle Net Manager application. This has to be done for both
32bit and 64bit installations. Name the service names the same. Remember
these are where you use the connection information of the oracle server.
Make sure you can connect to the Oracle instance on server
through the SQL-Plus untility.
You’ll probably have to create a new folder called “ProgramFilesx86” on the root of installation path of SQL Server.
Then COPY everything from the “Microsoft Visual Studio 9.0” folder in that. THEN ALL BIDS shortcuts needs to ref
that direcory otherwise the connector wont allow BIDS to manually run the package since BIDS is a 32 BIT application.
“I do not expect that Microsoft will write an Oracle fast loader – currently it comes in around number 999 in my list of 1000 features for next version, just slightly ahead of recompiling for Linux.” – Donald Farmer
The Analyze Key Influencers tool is used to show how column values in a data set might determine the values of a specified target column. The process creates a temporary mining model in Microsoft SQL Server Analysis Services using the Naïve Bayes algorithm. It then produces a Main Influencers report which represents the key influencers for a distinct value of the target column. You have the option of creating one or many additional Discrimination Reports that compares the influencers for any two distinct values of the target column. The Discrimination Reports are only useful if your target column contains more than two distinct states.
The Naïve Bayes algorithm is a simple probabilistic classifier based on applying Bayes’ theorem with strong independence assumptions. The naïve part of the name comes from the fact that it assumes that all attributes are unrelated to each other and that the combination of attributes independently contribute to the probabilities that it predicts. For example, a fruit may be considered an orange if it is round, has the color orange, has seeds, grows on a tree, etc. Even if any of these features depend on the existence of other features, a Naïve Bayes classifier considers these properties to independently contribute to the probability that the fruit is an orange. One advantage of this algorithm is that it only requires a small set of data to estimate the means and variances of the variables required for classification.
This blog post will work through two examples using the sample data provided with the Microsoft SQL Server 2012 Data Mining Add-ins and another example using data from the Contoso sample database.
Which properties of a customer in the sample data help to predict a customer’s level of education?
- Open the DMAddins_SampleData.xlsx file.
- Select the Table Analysis Tools sample sheet, highlight a cell within the table so the ribbon at the top displays the Table Tools, Analyze ribbon, and click the Analyze Key Influencers button.
- Select the column Education to analyze for key factors and click the link that says ‘Choose columns to be used for analysis.’
- Uncheck the ID column. This is just a sequential number that has nothing to do with anything other than the order the row was inserted into the table. We also want to uncheck any other columns that have nothing to do with the customer’s education level to streamline our analysis and improve our accuracy. Let’s also uncheck the purchased bike column. Click Ok, and then Run.
- Once it finishes thinking, move the Discrimination based on key influencers dialog out of the way for a moment.
The Key Influencers Report for Education shows which columns and which values of those columns have a significant impact over the value of the Education column. According to this report, people between the age of 37 and 46 who work in Management are very likely to have their Bachelors degree. Persons with only one car and work in a clerical profession are very likely to have only attended some College. People with two cars that work in a manual occupation and earn less than about 39K per year are likely to have only attended high school. Similar characteristics apply for those that only received a partial high school education. Persons that do not own an automobile are very likely to have completed a graduate degree.
Now, back to the Discrimination report dialog that we moved out of the way. Let’s run a discrimination report that compares those with graduate degrees with those who only attended some of High School.
We can add as many discrimination reports as we want.
The Table Analysis Tools Sample worksheet only contains 1000 rows. When we go through the exact same steps on the Source Data sheet which has 10,000 rows, we get remarkably similar results.
Next, I’ll run the tool to see what factors most strongly influence whether or not the customer is likely to purchase a bike.
- Give the Source Data worksheet focus. Click the Analyze Key Influencers button.
- Select BikeBuyer as the column to analyze. Uncheck ID from the columns to analyze and run the analysis.
- Go ahead and run a Discrimination report against the Yes/No values. This will demonstrate that this report is useless for target columns with only two values.
The Key Influencers Report for BikeBuyer shows us that strongest predictors of whether or not the customer is likely to purchase a bike are when the customer doesn’t own any cars, and that they are between the ages of 36 and 46. The strongest predictors that they will not buy a bike are when they own two cars and are over or equal to the age of 64.
The discrimination report shows us essentially the same thing.
For the next example, I have imported the V_Customer view from the Contoso Retail demo database which you can download from Microsoft.
If you import the data using the Data ribbon, From Other data sources button it will automatically format it as a table which is required. If you import your data from a CSV or copy and paste it into a spreadsheet it may not be formatted as a table.
- Once the data is Excel, formatted as a table, click the Analyze Key Influencers button and select HomeOwnerFlag as the column to analyze.
- Click the Choose columns to be used for analysis link and uncheck CustomerKey and Consumption and Run the analysis.
Here we see that MaritalStatus has the most impact on influencing the value of HouseOwnerFlag. We also see that not having any children is a strong indicator for not owning a home.
I hope this explains how to use the Analyze Key Influencers tool sufficiently. If you have any questions, please use the comments section below.
Here are some additional links:
In my experience, Data Source Views are not commonly used in SSIS packages. I can imagine a use for them. If I were the ETL architect leading a large team of developers on a data warehouse project, I might use the DSV to simplify the development efforts of members of my team. Using AdventureWorks as an example, I may have one team working on a star schema based around FactInternetSales and another team working on a star schema based around FactResellerSales. As I create my package templates for the team to use I can include the DSV to simplify their view of an otherwise complex dimensional model.
The MSDN article, Using Data Source Views in Packageslists the advantages of using DSVs in packages as follows:
- A data source view can be defined one time and then used by multiple data flow components.
- A data source view can be refreshed to reflect changes in its underlying data sources.
- A data source view caches the metadata from the data sources on which it is built.
- A data source view can focus the lists of objects that appear in the user interface of data flow components.
It is important to note that there are not any dependencies between a DSV and the packages that reference it. When a package uses a DSV, the definition of the view is added to package source code in the data flow component that uses it. If you delete all of the DSVs from an existing solution, all of the packages that used those DSVs will continue to function.
So here is how you use a DSV in an Integration Services package:
- Start with a blank package by loading BIDS and clicking File, New, Project, and selecting the Integration Service Project template and clicking Ok.
- Right click the Data Sources folder in the solution exploder and select new data source.
- Click the New button in the Data Source Wizard.
- Make sure the provider is the Native OLE DB\SQL Server Native Client 10.0 (assumes SQL Server 2008.)
- Select the server and the database name (localhost, AdventureWorksDW2008R2), Click Ok, and then Finish twice.
- Right click the Data Source Views folder and select New Data Source View. Click Next twice.
- Select FactInternetSales and click the right arrow to move it to the Included Objects pane.
- Click the Add Related Tables button. Your screen should look like this:
- Click Next, name your DSV InternetSales and then click Finish. Admire the pretty diagram.
- Now, double click the Package.dtsx file under the SSIS packages folder to open the blank package.
- Drag a Data Flow Task from the Toolbox onto the Control Flow.
- Double click the Data Flow Task
- Right click inside the Connection Managers section of the IDE and select New Connection From Data Source. Click Ok.
- Drag an OLE DB Source component onto the Data flow design canvas.
- Double click the OLE DB Source component.
- In the drop down for OLE DB Connection manager you will see the InternetSales DSV underneath the data source we defined.
- Select the InternetSales DSV and click Ok.
- Best practice is to never use Table or view in the Data access mode so before we change that to SQL Command, notice the Name of the table or view dropdown contains only the tables related to and including FactInternetSales. Now change it to SQL command.
- In the SQL command text box, type SELECT * FROM dbo.FactInternetSales WITH (NOLOCK);
- Click the Build Query button and then click Ok.
- Notice it changed my SQL command text to include the column names rather than SELECT * complying with best practice recommendations. I would normally delete any columns that I didn’t actually insert into my destination to save memory during execution and prevent spillage to disk when the buffers fill up.
I hope this post will help someone. If you have any questions please add a comment below.
Recently I had the pleasure of pulling in a flat file encoded in UTF-8 format into a SQL Server 2008 table using SSIS. I’d like to share my experience in the hopes that it will help someone else with a similar issue. It might also help me in the event I come across the situation again and forget how to get past it.
The first indication of a problem was the following error message:
The data conversion for column “Column 37” returned status value 4 and status text “Text was truncated or one or more characters had no match in the target code page.”
The SSIS output also told me what row of data was causing the problem:
” on data row 1415.” (search for “on data row” if you need to find out what row your process failed.)
I verified that the column was wide enough so I opened up EditPadLite and loaded the offending file. I pressed control G to go to a specific row and saw that the row had a column with the text: Düssledorf. Without going into a bunch of detail of all the things I tried; but, didn’t work, I will explain what finally allowed me to bring Düssledorf into the table with umlaut preserved.
These are the types of errors I was getting as I struggled to EDIT the components:
Codepage in output column 0 is 1252, but has to be 65001.
1. EditPadLite: clicked Convert, Text encoding from menu.
This told me the file’s encoding was Unicode UTF-8.
The next steps have to be followed exactly and in the correct order. There is no editing about it. If you already have ANY of these three objects they MUST be deleted and re-created from scratch: Flat File connection manager, flat file source, OLE DB Destination. Seriously, no joke.
Note: Before you delete your Destination you may want to sort the columns by the destination column and take screenshots if your incoming columns are listed as column 1, column 2, etc. because you will have to remap them.
1. Create your flat file connection manager using code page 65001 (UTF-8) and on the advanced tab change the datatype property of any columns that contain umlauts to unicode string dt_wstr or dt_ntext for Unicode text stream.
2. Create your Flat file source component
3. Create your ole db destination component. Change defaultCodePage property to 65001 and AlwaysUseDefaultCodePage to True. Hook your source to destination and do your mappings. (Your target column needs to be nvarchar or another unicode capable data type.)
Sounds simple; but, believe me if one step is out of order you simply can’t edit it and you will go out of your mind trying to. You HAVE to do it in the exact order I have described.
I’ve joked in the past that the data miner is to the BI analyst what the weather person is to the news reporter. One reports on the past, and the other predicts the FUTURE! Pretty cool when you look at it that way. So let?s do something meaningful today that can help our entire nation. Let’s forecast the unemployment rate using the Microsoft SQL Server 2012 Data Mining Add-ins for Office 2010!
I started by downloading and installing the FRED Excel add-in which you can find here. I pulled the unemployment rate by month into Excel and formatted it as a table. I set aside the data from 2012 to use as a gauge to see how well my model does at predicting. Clicking the forecast button in Table Tools Analyze ribbon and I was on my way.
I’ve got great news! The unemployment rate is going down!
Comparing my forecasted values to the actuals shows there is some room for improvement.
Next we’ll add the job openings figures to the mix. All I have to do is paste these values next to unemployment rate numbers and delete the rows from my previous forecast.
Unfortunately, this doesn’t improve our forecast.
Next, I add the number of Hires for each month. It didn’t improve my unemployment forecast; but, it did improve my job openings forecast.
I went on to try many other data sets including gas prices, compensation, the dollar index relative to other currencies and more. Although I didn’t improve my predictions I hope I did manage to demonstrate the methodology one might use to see what factors influence the unemployment rate.