Archive

Posts Tagged ‘[SSIS]’

Deploying SSIS Packages with XML Configurations

SSIS trivia: What do admins detest most about SSIS deployment?
Answer: The Environment Variable configuration.
One the features included by Microsoft for flexibility & ease of deployment between environments. If your answer was registry files, you were close. Environment Variables is on top (Registry Files comes in second).

Overview: This article is all about using XML configurations in SSIS, direct vs. indirect configurations, pros and cons and explaining the deployment issues and concerns with these types of configurations. For the purpose of this article, when relating to database connection strings I am assuming that AD authentication is used all over. SQL authentication is out of scope, for now (otherwise I will start a geek riot).

XML Configurations: XML configurations in SSIS can be used in one of two ways
1. Direct Configuration: This configuration setting takes the form of a hardcoded path in the package itself and changing this between different environments when moving packages to QA or Production; it is the responsibility of the deployer to rely on the manifest (explained below) or changing the dtsx manually. When this type of configuration is used, it becomes a mandate that the path of the XML Configuration file is going to remain the same for all developers (on their machines) working on the project.
For example the path of the XML config is C:\myXMLConfig.dtsconfig for say 10 packages in a solution. Now, when another developer joins the merry band, the new developer has to ensure that the configuration file used is in the same path set by the previous developer. If the new developer places his configuration file in a different location, say something like “D:\myXMLConfig.dtsconfig”, then the previous developer’s development environment will not load correctly (Assuming both developers are working on gold code and checking out and committing to the same source repository).


Fig 1: Direct XML Configuration

The SSIS Manifest:
The SSIS Manifest file is a setup file, similar to the Installers that you run for software products which on the click of the next-next-finish complete your software installation. The manifest file when opened launches a wizard where you can specify the new location for the configuration and change configurations and also the new location of the packages when deploying to different environments. From what I understood from the working of the manifest; it basically does two important tasks (please comment if I missed any):

a. Copying the SSIS packages between environments (Same as using XCOPY or DTUtil. Bet DTUtil uses XCOPY behind the scenes)

b. Change the XML configuration file path (a simple Find-Replace based on the XML node find on Configuration Type =5 or 1. Open the SSIS package in a text editor to view the SSIS package XML code for configuration types)

i. ConfigurationType =5: Indirect Configuration (described below)
ii. ConfigurationType =1: Direct (hard-coded) XML file path configuration.

2. Indirect configuration: Indirect configurations in SSIS packages allow you to reference a configuration file to a virtual name i.e. to an environment variable. This means that the SSIS configuration value that is embedded in the package(s) is an environment variable (key\name), the value of which is the actual path of the XML configuration file. When this kind of a setting is used for configurations, during deployment time, only the environment variable need to be added to the environments where the packages are deployed.


Fig 2.1: Indirect XML Configuration


Fig 2.2: Environment variable configuration

There are issues where implementing this type of configuration setting in some environments. Two main issues:

a. Machine reboot on environment variable add\modify: When an environment variable is added, in order for it to take effect, i.e. in order for the packages to start recognizing it, the machine needs to be restarted. This is not true. Same holds true when modifying the environment variable. There have been numerous questions and concerns on this topic and since an obvious panic button is hit with the word ‘reboot’, people tend to stay away from this setting. The reality check is that the process which is running the SSIS packages needs a restart. Typically this is the SQL agent, which is the scheduler for the packages. In a development environment the BIDS needs a restart. This is one of the most important concerns a DBA has with SSIS configuration.

b. Development & QA on same server: This issue crops up when DBAs provision development and QA instances of SQL Server on the same machine. In such environments it becomes impossible to implement indirect configuration settings with environment variables as one environment variable cannot be used for both environments development and QA.

Other than the obstacle (b) above, the environment variable coupled with XML configurations is the simplest and most flexible way of deploying SSIS packages between environments (again, going back to the assumption of AD authentication). I have gone down the path of trying to think of different ways of convincing the admins for implementing environment variables at various companies; in some cases I was able to sell my pain, in other cases had to walk away with explanations on the ‘complexity’ of ‘maintaining’ environment variables and it was back to the drawing board to find another solution.

As a developer, I may not be totally in tune with the ‘complexity’, ‘security breaches’ and the ‘challenges’ (stress on quotes) involved in maintaining environment variables, but really?, after all the security ‘gizmos’ an organization has in place to stop Intrusions, inc. one would really ponder on the need for an overdose of security, but, such is life … and I don’t blame the admins for doing their job (maybe I would have done the same thing if I were a production DBA). DBAs would want to minimize the loopholes, build better security practices and keep the servers clean of environment variables and registry files; and on the other hand a smart developer wants to incorporate similar best practices, coding standards, ease of deployments between environments, etc. and truly it is difficult to see green on both sides of the fence.

In the domain of deployment of packages between environments; the MS Integration Services team did a good job making the XML configurations, database configurations etc. available to us in SSIS, using which the migration of code between development, QA and production has become simple and efficient. Along those lines, they have given us the (risky) capability of saving sensitive information as clear text in XML configurations. They do have a neat way of letting us save the sensitive information and we take the risk of saving such information, i.e. the packages themselves strip the passwords and anything marked sensitive. This goes back to deployment modes msdb and file system which I will not delve into.

If one would think of SSIS configurations as a bane; try implementing continuous integration and deployment of SSIS projects\packages without configurations, and trust me; if you have not already, you will feel the pain (been there, felt that).

Summary:
At the end it comes to selling your pain in that meeting room with quasi-knowledgeable management, the so called technical managers and DBAs over security &maintainability vs. flexibility & ease of implementation. You know which scale weighs heavier, don’t you? It’s the DBA side of the scale, i.e. the security part of it. But, you don’t see me frowning, coz someone once told me “A problem\setback is an opportunity in disguise” (or something like that). In my next article I will demonstrate how I really did find opportunity in this obstacle (hint: has something to do with continuous integration with SSIS & C# and the next paragraph).

Personally, i like the database configurations and in my implementations of SSIS projects. I give it a mix of XML configuration (The road less travelled: indirect in development environment, convert it to direct in QA & Production.) and database configuration elements to it. This keeps the deployers\admins at bay and gives me more power as a developer to enforce configurations at the database level for my application.
More about the road less travelled in my next article.
Sometimes I wonder if I should start another blog – SqlDeveloperRants.wordpress.com.
  

Thanks,

Vishal Gamji

MCITP – Database Developer

admin@expertsqltraining.com

SSIS Execute Task Failure

Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.

Every so often ETL developers working with the Execute SQL Task in SSIS encounter the error above. I have seen a few developers who try to ‘quick-fix’ this by changing type mappings without exactly knowing the differences between them and re-running the ETL task only to get to the next red light. The best example of this would be the type mismatch of Long and Numeric types when using the native OLEDB provider. I wouldn’t be surprised if there are developers out there looking up the precision error on the types. I will not delve too deep into all the providers, but will provide a reference to a very good resource that Microsoft has provided, and which I think should be on every ETL developer’s desk.

In this (first) blog I will point out the most common places in the Execute SQL Task where you should troubleshoot the error (above). Will also go over some SQL provider information at a high level.

Steps to troubleshoot the ‘Execute SQL Task’ error (order need not be followed):

1. Verify that for the stored procedure\SQL statement used, the parameter counts and direction (input, output or return value) is set appropriately.

If using a OLEDB or ODBC provider check if your number of “?” (parameters), and if using ADO or ADO.NET provider, check the number of @ are equal to the number of parameters being mapped to on the parameter mapping screen.

2. Verify your data types.

Verify that the data types that exist\declared in the stored procedure or SQL are mapped to a compatible data type in the parameter mapping. For example: when using the OLEDB provider, map the integer (int) parameters to Long data type.

Providers: Which data provider should be used when choosing between (Managed) ADO.NET or (Native) OLEDB? – There are n number of websites and blogs (yes, this is also another one of them) which will tell you why you should prefer the use of OLEDB provider over ADO.NET, as ADO.NET is a managed provider which adds another layer code to connect to the data source, thus making ADO.NET slower than OLEDB. If you really want to find the exact runtime exection difference I would suggest running profiler with the two connection managers and then comparing them.

Helpful links for data type mapping and provider info:
http://msdn.microsoft.com/en-us/library/aa198346(SQL.80).aspx
http://msdn.microsoft.com/en-us/library/aa263420.aspx
http://www.carlprothman.net/Default.aspx?tabid=97#10

3. Verify the ResultSet property:

Verify that the ResultSet property is set appropriately for the SQL command being executed. For example: if the SQL command\stored procedure returns a full result set, set it to the ‘Object’ data type, so that you can retrieve from the object based on the index position of the result (0 = first column, 1 = second column and so on) as members of the ‘Object’ type are ordinal. Also make sure that all columns in the sql result set have column names.

4. Do you trust your SQL Command\Stored procedure, there is a very good possibilty if it being incorrect:

The execute SQL task can do nothing about bad\erroneous SQL code given to it for execution but fail it. As a routine, test the SQL for all possible exceptions, giving more importance to situations where a value or a list of values is expected and nothing is returned.

For example: the sub query inside the stored procedure may be returning multiple values, NULL\invalid or no values at all. Watch out for the tricky no value situation.

Example: In this example, our objective is to retrieve object_id from sys.objects where the name meets a certain condition. The ‘ResultSet’ property of the execute sql task is being set to “Single Row” (as our objective is to retrieve a scalar value).

[Note: Each Case builds on the previous cases.]

Case 1: Simple Select to assign to a variable

SELECT object_id FROM sys.objects WHERE name = ‘sysrowsets’;

 

If the result set of the execute sql task is set to “Single Row”, it will work only when the WHERE condition is satisfied. If it is not satisfied, it does not return NULL, instead returns nothing, i.e. an empty result set; which will cause the task to fail giving us the same error “ResultSet property not set correctly“. [A developer must keep this exception in mind when testing the execute sql task]. This query will also raise an exception when multiple values are returned by the query and our task is to assign a single value to a variable from the output of our query.

Case 2: Handle multiple values.

DECLARE @Object_ID int;


SET @Object_ID = (SELECT TOP 1 object_id FROM sys.objects WHERE name = ‘sysrowsets’);

 

In this case, we handled the multiple values problem from Case1. Now, if the result set returns anything other than a NULL value or a Non-Empty value, we are close to living in a perfect world, but, as it turns out (someone told me) that we don’t. If\When a NULL or an empty result is returned by the above query an exception will be thrown. Yes; it is the same exception we are discussing in this article.

I am more inclined on the ‘When’ it will happen than the ‘if’ possibility because at-least in the scenarios I have worked with, it was bound to happen, then again, your case may be different.

Case 3: Split the simple select into ‘simpler’ select.

DECLARE @Object_ID int;


SET @Object_ID = (SELECT TOP 1 object_id FROM sys.objects WHERE name = ‘sysrowsets_Bogus);


//Comments-The _Bogus makes the query to return an empty result set.


SELECT ISNULL(@Object_ID, 0) AS ‘Object_ID’;

 

The first Select statement here takes care of multiple values; the second Select statement handles any NULL values and sets it to 0 and also solves the ’empty’ result set problem when the @Object_ID is selected separately in a different query (as the 2nd query).

Summary

To summarize, double check your SQL queries\Stored procedures before you execute them in the ExecuteSQLTask. Don’t jump to hasty conclusions about the error message description Integration Services provides you. For instance if you get any ‘Arithabort error’ [Ex: Update failed because the following SET options have incorrect settings: ‘ARITHABORT’] on a stored procedure that performs an insert\update\delete – check the database compatibility level (sp_dbcmptlevel) which maybe a SQL Server 2000 database with setting 80 and needs to be upgraded to compatibility level 90 of SQL Server 2005, or verify if any columns are computed columns performing calculations which may be throwing arithabort errors. Check for any external factors that are affecting your ExecuteSQLTask variables, for instance package configurations or dynamic expressions. Performing these checks will surely save you some time in the long run and also make your packages robust.

Hope this article\blog was helpful. Do leave a feedback (as this is my first blog). My next blog is about enumerating Reporting Services metadata from models into a database repository. Hope to get it out soon.

Thanks,

Vishal Gamji

MCITP – Database Developer

admin@expertsqltraining.com