SSIS and Oracle connections

***UPDATE***

The new SSIS connectors for Oracle and Teradata are available now.

http://blogs.msdn.com/b/mattm/archive/2012/04/04/microsoft-connectors-v2-0-for-oracle-and-teradata-now-available.aspx

FYI the 2.0 version is for 2012 only… You want the 1.2 version for 2008

—Thanks to Mike Davis & Devin Knight for the info!

more help…

Using the Microsoft Connector for Oracle by Attunity with SQL Server 2008 Integration Services

http://msdn.microsoft.com/en-us/library/ee470675%28v=sql.100%29.aspx

——————————————————————————-

I was recently tasked with creating a test package to check the performance of SSIS loading data from MS SQL Server 2008 to an Oracle database and loading data from Oracle to MS SQL Server.

My tests have only been performed on a 32bit laptop running Windows XP (Yeah, I know…)

Here are the steps I took:

  1. Install Oracle Developer Suite 10g (10.1.2.0.2) (including Forms and Designer)  http://www.oracle.com/technetwork/developer-tools/developer-suite/downloads/index.html  (for 64 bit test, you would just need to install the 64 bit version.)
  2. Edit C:\DevSuiteHome_1\NETWORK\ADMIN\TNSNAMES.ORA by appending the TNS information provided by your Oracle DBA.
  3. Edit C:\WINDOWS\system32\drivers\etc\hosts using notepad.  Add the host entries provided by your Oracle DBA to the file.
  4. Create a new SSIS package, add a connection manager of type Native OLE DB\Microsoft OLE DB Provider for Oracle.  Enter SOMETEXT (see example TNS Entry below) in Server name, and the user name and password provided.  Click Test Connection.
  5. Create an Execute SQL Task to truncate the Oracle table you will be loading.  Must be coded like this:  truncate table “SOME_TEST”.”Movies”    **NOTE: SOME_TEST is the schema name.
  6. Create another Execute SQL Task to truncate a SQL Server table called Movies2.
  7. Create a data flow to pull data from SQL Server to Oracle.
  8. Create a data flow to pull that data from Oracle into another new table in SQL Server (Movies2).
  9. Data Types are tricky.  You may have to cast as nvarchar or numeric(38,0) when pulling from SQL Server and use derived column transformations when pulling from Oracle.
  10. For the Oracle Source & Destination components, right click and select Show Advanced Editor.  Under component properties, set AlwaysUseDefaultCodePage to True.
Now comes the fun part…
Using the Oracle Connection managers that come w/ SQL Server, it took approximately 10 minutes 45 seconds to pull about 45,000 rows of 2 columns full circle.   (That is slow!)
My good buddy, Stephen Bowley, told me about the Attunity drivers that you can download from Microsoft’s website.

http://www.microsoft.com/downloads/en/details.aspx?familyid=6732934C-2EEA-4A7F-85A8-8BA102E6B631&displaylang=en

These drivers only took 45 seconds to complete the same task.  It creates new data flow components specifically for Oracle source and Oracle destination (similar to the OLE DB Source / Destination components; but, specifically for Oracle.)

——————————————————————————-

ExSample TNS Entry:

SOMETEXT =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = SOMESERVERA)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = SOMESERVERB)(PORT = 1521))

(LOAD_BALANCE = yes)

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = APPGA)

)

)

——————————————————————————-

Example Hosts entry:

86.7.5.309     SOMESERVERA

12.34.56.78   SOMESERVERB

——————————————————————————-

The following are the instructions my buddy Stephen sent me.  I didn’t have to follow these b/c my test was only done on a 32 bit laptop.  I think he was developing on a 64bit environment.  I will post them here in the event they might help someone else.

If server is 64bit, Then we have to install both the

32bit and 64bit drivers for the attunity connector utility.

Can be found at:

http://www.microsoft.com/downloads/en/details.aspx?familyid=6732934C-2EEA-4A7F-85A8-8BA102E6B631&displaylang=en

Once installed, you’ll have to installed the correct Oracle Client tools.

This also requires you install the 32bit and 64bit versions.

next you have to setup the Service Names on the DB box through oracle

using the Oracle Net Manager application. This has to be done for both

32bit and 64bit installations. Name the service names the same. Remember

these are where you use the connection information of the oracle server.

Make sure you can connect to the Oracle instance on server

through the SQL-Plus untility.

You’ll probably have to create a new folder called “ProgramFilesx86” on the root of installation path of SQL Server.

Then COPY everything from the “Microsoft Visual Studio 9.0” folder in that. THEN ALL BIDS shortcuts needs to ref

that direcory otherwise the connector wont allow BIDS to manually run the package since BIDS is a 32 BIT application.

Helpful Links:

http://www.attunity.com/forums/microsoft-ssis-oracle-connector/error-failed-load-oci-dll-1308.html

http://sqlblog.com/blogs/jorg_klein/archive/2011/06/09/ssis-connect-to-oracle-on-a-64-bit-machine.aspx

Funny quote:

“I do not expect that Microsoft will write an Oracle fast loader – currently it  comes in around number 999 in my list of 1000 features for next version, just  slightly ahead of recompiling for Linux.” – Donald Farmer

http://gingerandsugar.blogspot.com/2007/06/export-sql-server-data-into-oracle.html

——————————————————————————-

Advertisements

SSIS: How to Make Use of the DSV

In my experience, Data Source Views are not commonly used in SSIS packages.  I can imagine a use for them.  If I were the ETL architect leading a large team of developers on a data warehouse project, I might use the DSV to simplify the development efforts of members of my team.  Using AdventureWorks as an example, I may have one team working on a star schema based around FactInternetSales and another team working on a star schema based around FactResellerSales.  As I create my package templates for the team to use I can include the DSV to simplify their view of an otherwise complex dimensional model.

The MSDN article, Using Data Source Views in Packageslists the advantages of using DSVs in packages as follows:

  • A data source view can be defined one time and then used by multiple data flow components.
  • A data source view can be refreshed to reflect changes in its underlying data sources.
  • A data source view caches the metadata from the data sources on which it is built.
  • A data source view can focus the lists of objects that appear in the user interface of data flow components.

It is important to note that there are not any dependencies between a DSV and the packages that reference it.  When a package uses a DSV, the definition of the view is added to package source code in the data flow component that uses it.  If you delete all of the DSVs from an existing solution, all of the packages that used those DSVs will continue to function.

So here is how you use a DSV in an Integration Services package:

  1. Start with a blank package by loading BIDS and clicking File, New, Project, and selecting the Integration Service Project template and clicking Ok.
  2. Right click the Data Sources folder in the solution exploder and select new data source.
  3. Click the New button in the Data Source Wizard.
  4. Make sure the provider is the Native OLE DB\SQL Server Native Client 10.0 (assumes SQL Server 2008.)
  5. Select the server and the database name (localhost, AdventureWorksDW2008R2), Click Ok, and then Finish twice.
  6. Right click the Data Source Views folder and select New Data Source View.  Click Next twice.
  7. Select FactInternetSales and click the right arrow to move it to the Included Objects pane.
  8. Click the Add Related Tables button.  Your screen should look like this:

image

  1. Click Next, name your DSV InternetSales and then click Finish.  Admire the pretty diagram.
  2. Now, double click the Package.dtsx file under the SSIS packages folder to open the blank package.
  3. Drag a Data Flow Task from the Toolbox onto the Control Flow.
  4. Double click the Data Flow Task
  5. Right click inside the Connection Managers section of the IDE and select New Connection From Data Source.  Click Ok.
  6. Drag an OLE DB Source component onto the Data flow design canvas.
  7. Double click the OLE DB Source component.
  8. In the drop down for OLE DB Connection manager you will see the InternetSales DSV underneath the data source we defined.

image

  1. Select the InternetSales DSV and click Ok.
  2. Best practice is to never use Table or view in the Data access mode so before we change that to SQL Command, notice the Name of the table or view dropdown contains only the tables related to and including FactInternetSales.  Now change it to SQL command.
  3. In the SQL command text box, type SELECT * FROM dbo.FactInternetSales WITH (NOLOCK);
  4. Click the Build Query button and then click Ok.
  5. Notice it changed my SQL command text to include the column names rather than SELECT * complying with best practice recommendations.  I would normally delete any columns that I didn’t actually insert into my destination to save memory during execution and prevent spillage to disk when the buffers fill up.

image

I hope this post will help someone.  If you have any questions please add a comment below.

SSIS–Um…. How To Preserve Umlauts from UTF-8 Flat File in Düssledorf? (or anywhere else for that matter…)

Recently I had the pleasure of pulling in a flat file encoded in UTF-8 format into a SQL Server 2008 table using SSIS.  I’d like to share my experience in the hopes that it will help someone else with a similar issue.  It might also help me in the event I come across the situation again and forget how to get past it.

The first indication of a problem was the following error message:

The data conversion for column “Column 37” returned status value 4 and status text “Text was truncated or one or more characters had no match in the target code page.”

The SSIS output also told me what row of data was causing the problem:

” on data row 1415.”   (search for “on data row” if you need to find out what row your process failed.)

I verified that the column was wide enough so I opened up EditPadLite and loaded the offending file.  I pressed control G to go to a specific row and saw that the row had a column with the text: Düssledorf.  Without going into a bunch of detail of all the things I tried; but, didn’t work, I will explain what finally allowed me to bring Düssledorf into the table with umlaut preserved.

These are the types of errors I was getting as I struggled to EDIT the components:

The column “Column 2” cannot be processed because more than one code page (65001 and 1252) are specified for it.

Codepage in output column 0 is 1252, but has to be 65001.

1.  EditPadLite: clicked Convert, Text encoding from menu.

image

This told me the file’s encoding was Unicode UTF-8.

image

The next steps have to be followed exactly and in the correct order.  There is no editing about it.  If you already have ANY of these three objects they MUST be deleted and re-created from scratch:  Flat File connection manager, flat file source, OLE DB Destination.  Seriously, no joke.

Note:  Before you delete your Destination you may want to sort the columns by the destination column and take screenshots if your incoming columns are listed as column 1, column 2, etc. because you will have to remap them.

1.  Create your flat file connection manager using code page 65001 (UTF-8) and on the advanced tab change the datatype property of any columns that contain umlauts to unicode string dt_wstr or dt_ntext for Unicode text stream.

2.  Create your Flat file source component

3.  Create your ole db destination component.  Change defaultCodePage property to 65001 and AlwaysUseDefaultCodePage to True.  Hook your source to destination and do your mappings.  (Your target column needs to be nvarchar or another unicode capable data type.)

Sounds simple; but, believe me if one step is out of order you simply can’t edit it and you will go out of your mind trying to.  You HAVE to do it in the exact order I have described.

More info

Best Fit in WideCharToMultiByte and System.Text.Encoding Should be Avoided.

BIDS 2008 R2 inserts phantom codepage(s) into SSIS components irrelevant to format/locales/codepages of data and software used (blocking the SSIS tasks

Flatfile Import: Persistent data conversion errors

SSISsy Spacek and FICO Blaze Advisor Business Rules Engine vs Extract Transform and Load

SSISsy Spacek and FICO Blaze Advisor Business Rules Engine vs Extract Transform and Load

SSISy

What is a business rules engine?

A business rules engine allows an ETL team to design their data flow processes in such a way that the business rules are segregated from the ETL process.

In an ideal world, nothing would ever be hard-coded in your packages. Imagine a solution where EVERYTHING is configurable.  From an ETL framework perspective we should be familiar with this concept.

What does this mean from a business rules perspective?  In my opinion it means that anything in your SQL query where clause filters, case statements, and join conditions (other than surrogate keys) are candidates for business rules segregation.  Many will initially cry overkill; but, having worked in large corporate environments allow me to put forth the proposition that it is far easier to submit a single SQL script to a DBA that will update a business rules configuration table than it is to make a change to an ETL package.  The latter typically requires: code checkout, unit testing, data validation, peer review, code check-in, versioning, documentation, request to the DBA to deploy to QA, QA Review, QA approval, request to DBA to deploy to production and perform production pre-install analysis, and finally production post install analysis.  Programming your SSIS packages in such a way that your business rules are removed from the package itself will provide a huge benefit to the ETL developer in that he or she will also be removed from the process as business rules change throughout the course of time.

When I was first introduced to the concept of a business rules engine I was asked to evaluate the potential of the FICO Blaze Advisor software to integrate with the Microsoft SQL Server Integration Services 2008 ETL processes.  My initial trepidations centered around how it would be able to handle large volumes of data.  Would it be able to perform set based operations or require row by rows calls out to some engine sitting on some other server?  The concept from the BRE champions is that all of your organizations business rules are centrally located in a single instance for all applications.

Googling provided very few leads.

http://www.sqldev.org/sql-server-integration-services/how-to-integrate-blaze-advisor-rule-engine-to-ssis-10588.shtml

and Sandeep never shared his results… (see comments.)

http://weblogs.asp.net/gsusx/archive/2007/02/23/ssic-components-for-salesforce-com.aspx

Posting on the Blaze Forums bore no fruit…

http://discuss.fico.com/t5/FICO-Blaze-Advisor-Discussion/Blaze-Advisor-and-SSIS-for-ETL-processes-any-luck/td-p/2854

To make a long story longer, this is what I think from an ETL development perspective…

Using Blaze in conjunction with your ETL process is like trying to fit a square peg in a star schema (LOL, couldn’t resist.)  Couple that analogy with hitting a nail with a sledgehammer and I think I made my point.

What it gives you is a web based interface for your end users to make changes to business rules which can be used to create dynamic SQL, populate lookup tables, and even perform as a sort of lookup component in themselves.  Complex business rules can be defined by the ETL developer in a variety of fashions including decision trees, matrices, case type logic, and complex “if then else” procedural logic.  Designed properly, there should be no need for any future involvement from the ETL team.  Pipe dream?  Perhaps… but that is the concept.

I would like to see a BRE designed specifically for MS SSIS.  I don’t think it would be hard to develop and it would serve a great purpose.  Recently, I’ve been looking at some of the excellent SSIS components available through Pragmatic Works in their revolutionary Task Factory product.  Perhaps they will be the company that brings such a product to light.

In the meantime, If you must have BRE solution that integrates with SSIS, I would suggest that you either roll your own, or download the evaluation version of Blaze Advisor.  Go through the tutorials.  They are a little buggy; but, you can get through them.

Finally, download the tutorial I created that goes hand in hand with the tutorials they created.  I even use the same examples.

http://www.bidn.com/Assets/Uploaded-CMS-Files/dff32413-a90c-4034-8171-1a87ecedcb88Integrating Blaze with SSIS by Example.docx

Once you get through that, explore passing an array to the Blaze engine, retrieving the results, and matching them up (via checksum I guess) w/ rows in the SSIS buffer to get better performance out of it (if you NEED it for a lookup.)  Or just use it for dynamic SQL and populating small lookup tables like I did (however, I still used arrays.)

Hit me up if you get stuck.  I’ll update this post as I get questions and have more time.