Best Practices: Staging Mainframe\DB2 data to SQL Server 2012

Introduction

The Extract, Transform & Load (ETL) process typically accounts for 70% of an entire data warehousing/data migration effort (according to The Data Warehouse ETL Toolkit). And this proportion represents a major element of the project in terms of risk: the implementation could make or break the project. This fact should be taken into consideration while designing and implementing an ETL solution.

Every staging project has two elements: the ETL itself and a staging data structure. So today I’d like to talk about best practices for standing up a staging area using SQL Server Integration Services [ETL] and hosting a staging database in SQL Server 2012 [DB]. The ‘best practices’ are across three areas: Architecture, Development, and Implementation & Maintenance of the solution. Each area represents key patterns and practices (not a comprehensive list) for the ETL component and the data structure of the staging database.

More …..

Categories: Uncategorized

Smith MBA Long Term Schedule Planning with PowerPivot

07/23/2012 1 comment
Introduction

In this post I am trying to solve a real world analytical problem using Microsoft PowerPivot. Before I get started, let me introduce a few terms here:

Long Term Plan: A customized package of R.H.Smith business school courses spread across five campuses offered in various combinations by area of expertise under specific departments.

Students at the Smith School of Business use the Long Term Schedule Planner to plan their long term schedule based on the courses that will be offered, the number of credits, the campus location, term, etc. and then select and register for those courses. And I am sure that there are many students out there who return to this page often to to see what other options are available and if there were any changes to the long term schedule.

The Problem

The current Long Term Schedule Planning page delivers its purpose only to an extent; it drives the user\student to pick courses based on a specific option set i.e. Term –> Campus –> Department. Only then can you look at the courses and proceed further. Personally I am not a big fan of analytical processes forcing me to think in one direction. It should be, as I see it, a flexible application where students can use any ‘perspective’ to look at their schedule, have a way of planning and saving their schedule, have the ability to drill down and apply filters by department or for that matter just a course and see which semesters is it offered and plan accordingly.

‘a’ Solution: Microsoft PowerPivot

So, as I was trying to build my customized long term plan and got tired of going back and forgetting where I started; I got an idea to get all the data required for long term planning into a Microsoft SQL Server 2012 database and create a PowerPivot model. For me not only was this was the simplest and the easiest way of analyzing data but also was a powerful way of visualizing the data. The solution was implemented in the cloud. I am making a copy of the Excel workbook with the data and making it available for download.

For those who are new to PowerPivot, “PowerPivot is a powerful data mashup and data exploration tool based on xVelocity in-memory technologies providing unmatched analytical performance to process billions of rows at the speed of thought” [MS PowerPivot]. It is available for free. Works as an add-in in MS Excel.

Software Requirements

To use the workbook the software required is:

1. Microsoft Excel 2010 (have not tried it with 07)

2. Microsoft PowerPivot (Free Download)

Screenshots of the developed PowerPivot worksheet
SmithLongTermPlanner     SmithLongTermPlanner
How to get the PowerPivot workbook ?

Download it here. (Make sure that the above mentioned software is installed before using this workbook).

Note: This is the first version of the workbook and may need some fine tuning. Please let me know how you want to improve this and I will make it happen. Any feedback is appreciated.

Concluding Comments

This was a very simple example of a “self service” BI (Business Intelligence) application. I would highly recommend trying out PowerPivot for your advanced analysis tasks at work. You will not be disappointed with this small but effective BI move. As a matter of fact, Microsoft BI has already made its way into many companies as the leading BI player in the industry. [Gartner Report]

If you are interested in learning more about Microsoft BI products or have a BI effort that you would like to solve or for that matter have any potential BI solutions that you think will give your company the strategic edge and competitive advantage, please do not hesitate to contact me.

About AIS

AIS is a privately held, technical services company headquartered in Reston, Virginia. Over the past 30 years, AIS has provided custom solutions to government agencies and businesses worldwide. We’ve built a reputation for delivering completed projects to our clients that are on time, within budget and exceed expectations. Today, AIS provides long-term career opportunities to more than 300 talented individuals and open additional offices in Maryland, North Carolina, Ohio, Texas and India.

Thanks for reading,

Vishal Gamji

Email: vishal.gamji@appliedis.com

LinkedIn: http://www.linkedin.com/in/vishalgamji

Automating SSIS 2012 ‘Project’ Deployment

Lets cut to the chase and talk about how one can deploy packages or enable CI with SSIS 2012 projects. This short blog is focused on deploying SSIS ‘projects’ as a whole rather than using DTUTIL, XCOPY etc.

Creating ‘tear down and rebuild-able’ artifacts

We all know that SSIS packages are nothing but mere XML files, right?. Technically yes, but they are actually way much more than that. But, since they don’t have properties of compiling as normal code, we need alternate ways of deploying these files or projects as a whole. The most obvious method would be XCOPY or DTUTIL. With the ‘project’ based deployment of SSIS we can do better. Better how? Deploy the project instead of individual files. When the project based deployment is selected; the build action on the project creates an “.ispac” file which can be used as an msi installer to deploy packages on other environments. But if this action of deployment needed to be automated or enabled for CI we need a way around the UI based aspect of the ispac file (deployment file).

I am assuming the reader is using project based deployment and is familiar with the ispac (integration services deployment file) file used for deployment with SSIS 2012. If not, please read this article http://sqlblog.com/blogs/jamie_thomson/archive/2010/11/10/introduction-to-ssis-projects-in-denali.aspx

The readiness steps needed to run this “.ispac” from the command line are

1. Recreating the integration services catalog: When projects get deployed using the UI based approach with the ispac file, there is an option to deploy the project to a specific integration services catalog (in a SQL Server 2012 instance) and one also gets the option to overwrite it. This needs to be in a script runnable in sqlcmd or powershell.

Tear Down:

IF EXISTS(SELECT 1 FROM SSISDB.internal.folders WHERE name = ‘CatalogName’)

BEGIN

       IF EXISTS(SELECT 1

                 FROM SSISDB.internal.projects

                 WHERE folder_id = (SELECT TOP 1 folder_id

                                   FROM SSISDB.internal.folders

                                   WHERE name = ‘CatalogName’))

       BEGIN

              EXEC [SSISDB].[catalog].[delete_project]

                     @project_name=N’SSIS’,

                     @folder_name=N’CatalogName’

       END

 

       EXEC [SSISDB].[catalog].[delete_folder]

              @folder_name=N’CatalogName’

END

Re-create:

Declare @folder_id bigint

EXEC [SSISDB].[catalog].[create_folder]

       @folder_name=N’CatalogName’,

       @folder_id=@folder_id OUTPUT

 

EXEC [SSISDB].[catalog].[set_folder_description]

       @folder_name=N’CatalogName’,

       @folder_description=N’OptionalDescription’

 

2. Deploying the project from the command line:  The next step is to prepare the command line executable of the deployment file (ispac). The ispac deployment file can be evoked from the command line using IsDeploymentWizard.exe with the following switches.

 

image

 

Ex: IsDeploymentWizard /S /SP:"C:\SSIS.ispac" /DS:"<myServerName>\<Instancename>"

/DP:"/SSISDB/SSIS"

Summary

As simple as that. Just have this two steps incorporated in your CI build steps or a batch file and you are all set. If you are looking for more information on IsDeploymentWizard, MSDN is a good resource. If you encounter any issues do let me know. More than happy to help.

Catching up – Starting with the common framework

Trying very hard to keep a balance between business school learning and timely tech updates. There is a lot of catching up to do ranging from MS SQL Server 2012 BI visualization capabilities like power view to the detail of using column store indexes. I did get an opportunity to work with Integration Services and certainly is a game changer in the area of ETL tools out there. I find it encouraging to see large organizations pursuing the use of SQL Server Integration Services not only for data warehousing related workloads, but also for day to day operational activities (tasks\processes).

Technical managers and decision makers should start thinking of a common platform for data movement across the enterprise; not just for governance reasons but more for the ability to trace and remove process redundancies from the organization. Need a place to start? Well, start by making an inventory of your processes. If you already have a project\process portfolio management tool use it to your advantage. Have a parallel development effort started on designing and implementing a common framework for maintaining these processes. This should have the ability to audit events, capture exceptions and errors, log process updates etc. More like a monitoring tool for all processes. Start simple. What do you want to know – “When did my process start\end?”, “Did it fail\why\Where do i see the error?”, “How many rows\files were transferred in the process”, “How do i get notified in the event of a failure?”. Let these types of questions drive your requirement. I would recommend to keep the implementation of the process framework as technology agnostic as possible as i have made this mistake by creating something close to the technology (script tasks for logging, auditing and event handling) and ‘when’ the technology upgrades, you are screwed. Just more work to redefine the framework.

Let me try to get a conceptual example with code as examples to use. Going back to the drawing board on trying to get something in place as a ‘common’ process hub, will be back soon. I hope …. with Managerial Accounting looming closer and sending shivers down my spine.

Categories: Uncategorized

Using BIDS 2008 with TFS 2010

Introduction

If you are working on BI projects and use Microsoft Business Intelligence Development Studio 2008 (BIDS\Visual Studio 2008) for SSIS, SSRS and SSAS and one fine day your application development team has decided to move on to Visual Studio 2010 and Team Foundation Server 2012 you might think initially that there was either a BIDS 2010 version out there that was released with Visual Studio 2010. If you are a SQL junkie you know better that you have to wait until SQL Server 2012. To be clear – You *can* host your current BIDS 2008 projects in TFS 2012, but unless you install the following components you will not be able to use Team Explorer for your BIDS projects, nor can source control actions (check-ins, check-outs etc.) be performed from the IDE. Without the additional components a BI developer is stuck to developing in BIDS 2008 and then make sure (as a separate task, post development) to regularly integrate code artifacts with TFS. Sure does sound painful, but lets make it easy on us.

Solution

The fix consists of installing Visual Studio 2008 Team explorer and the forward compatibility update for connecting to Team Foundation Server 2010 from Visual Studio 2008.

Installation steps:

1. Close any open instances of Visual Studio 2008 and Visual Studio 2010 before installation.

2. Download and Install VS 2008 Team Explorer using the following URL.

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=0ed12659-3d41-4420-bbb0-a46e51bfca86&displaylang=en

3. Install VS 2008 Service Pack 1: Download and install the following service pack

http://www.microsoft.com/downloads/en/details.aspx?FamilyId=FBEE1648-7106-44A7-9649-6D9F6D58056E&displaylang=en

4. Install the Visual Studio Team System 2008 Service Pack 1 Forward Compatibility Update for Team Foundation Server 2010 (VS90SP1-KB974558-x86.exe)

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=CF13EA45-D17B-4EDC-8E6C-6C5B208EC54D&displaylang=en

5. Open Business Intelligence Development Studio (Start\Programs\Microsoft SQL Server 2008 R2\ SQL Server Business Intelligence Development Studio)

  • Select “Tools” –> “Connect to Team Foundation Server”

TFS_Connect

  • Click “Servers” –> “Add” to connect to TFS 2012.

Note: Open the Visual Studio 2008\BIDS 2008 project from a Visual Studio 2008 solution *only*. If you open Visual Studio 2010 (using Team Explorer) and try to open the VS 2008\BIDS project from that  location it will corrupt your BIDS solution as VS 2010 will try to convert it\upgrade your 2008 solution to 2010. Make sure all the developers are aware of this and they don’t inadvertently upgrade the solution.

If all steps above finish smoothly, you are all set. Just make sure of the note above.

Summary

It certainly would have been nice if an upgrade to BIDS 2008 was provided when the VS 2010 release came along. All these tweaks would be avoided and the solutions could be managed cleanly. I take this as a small setback and not a major one. Because, in an agile environment where all the projects are set up with CI build and deploys, one could successfully manage VS 2010 projects (ex: Database Projects) and VS 2008 projects (ex: SSIS, SSRS projects) in tandem. The build configuration for BIDS 2008 projects will have to be performed as a custom build step as this would be another place where the build process in TFS 2012 may not support the older versions on Visual Studio. But still both projects versions are deployed under one build workflow. Hope this walkthrough was helpful.

Continuous Integration with Visual Studio 2010 Database Projects, Hudson & MSBuild (Part 3 of 3)

Introduction
This post is a continuation to the series on implementing a continuous integration environment for Visual Studio 2010 database projects (.dbproj) using Hudson, MSBuild and Subversion. Links to the previous two posts:

Part 1: Agile Data warehouse Planning & Implementation with Hudson, MSBuild, Subversion and Visual Studio Database Projects.
Part 2:
Managing Database Code for Continuous Integration

Part 1 was an overview of agile development practices for database\data warehouse development. Part 2 explained in detail on setting up the database project, adding code to change control (Subversion) and preparing artifacts for CI. Part 3 will make use of the deployable database script (sql) created as part of the Visual Studio deployment process (from part two) for setting up the CI environment for the database project.

Hudson can be used with NAnt or MSBuild to build and deploy database projects. I will perform a walkthrough of setting up the AdventureWorks database project (created in Part 2) on Hudson using NAnt\MSBuild and Subversion.

Installing Hudson
Hudson is a free continuous integration server mostly used by the dark side (a.k.a. the java community). I have worked with CruiseControl.Net and Bamboo but my personal favorite is Hudson for its management features, ability to extend plugins and a cool UI. Hudson is available for download
here. (Java is a prerequisite to install Hudson).

Install Hudson by running the command line statement (from the directory where Hudson.war was downloaded to):

java -jar hudson.war

By default Hudson is installed on Port 8080. If this port is taken by some other service, use the ‘-httpPort’ switch to change to port on which Hudson runs.

Ex: java -jar hudson.war –httpPort 7777

Hudson can also be configured to run as a windows service. The documentation already does a good job of explaining the steps for configuring hudson to run as a windows service so I will just redirect you to the right place.

Next, find and add MSBuild and Nant plugins from Manage Hudson –> Manage Plugins. The ‘Install’ button is at the very end of the page.

1

Create a project for the build process

Navigate to http://localhost:8080 (or to the port on which you have hudson running on). Start by creating a freestyle software project for our AdventureWorks sample. In this demo we will be creating a development hourly build of the AdventureWorks DB project. Click OK once done (ignore all the settings on next page, just hit save for now).

2

Adding the build step & Polling source control for auto updates
Hudson works with build files to execute against different targets, more like a service broker that works with different services to run different kinds of jobs. Targets can be a set of instructions grouped together under a common task umbrella. For example a ‘debug’ target for building the project in debug mode with configurations that are specific to development environments and having a ‘Deploy’ target for running the distributable product.

You could use either MSBuild or Nant to build and deploy the project.

Difference between using MSBuild & NAnt with Database Projects: With MSBuild the database project file AdventureWorks.dbproj works as the build file and one can make changes or customize it in the VS IDE itself which is really helpful. With NAnt and DBProjects,  one has to create a build file (.build) manually or use a free NAnt build file editor; whichever the case you would have to include the MSBuild task in the Nant build filee to build and deploy the database project. (Tasks are specific XML tags that are passed as instructions for a particular unit of work in a build file). The reason MSBuild is preferred for a database project .dbproj vs. a dbp project type lies in the way the deployable database file is created (see previous blog post) .

For example, if a new table script is added to the database project and you intend to include it in the deployment script (Adventureworks.sql) you need to rebuild the database project (Right click –> Rebuild). Only this action will recreate the deployable AdventureWorks.sql with all the database objects and include the new table that was recently added. And to do this action from the command line you need the MSBuild command. So… If you already have a provision for running the MSBuild task from Hudson directly, why would you use Nant. The answer – in this case since we are ‘only’ deploying the database project we are fine. But, if we had a different range of project types with different technologies, like a Java project for instance, then using Nant would make sense. It all depends on the composition of your build steps\events.

Before you can access the MSBuild tasks in NAnt you need to install the NAntContrib task libraries and include them in your NAnt build file. Since all of this sounds a bit complex, lets just use MSBuild for now and follow the steps below to add the build steps:

1. Click on the AdventureWorks-Dev-CI-Build project and click on Configure.

2. Assuming SVN is setup correctly and the database project is in source control (if not setup the repository from here and then access it from here), update the Subversion Repository URL as follows.

SourceControlSetup

3. Save and go back to the dashboard to test the changes. Hit the ‘Schedule a build’ button to start the project (Play button)

image

The second column ‘W’ (Weather Report) should go to sunny if everything was set up correctly.

image

Note: Click on the Build History –> Console Output. It should display all the files from AdventureWorks project being checked-out to a workspace location. In the event of a build failure, this would be the place to check for the failure error.

4. Make sure that the database project builds and deploys without errors from VS IDE.

5. Add build step: Go back to the project configuration and add an MSBuild step.

image

6. To add a trigger to poll for changes on source control, follow this final configuration or skip it.

image

7. Finally, save and go to dashboard. Start the build and verify the output.

This completes the setup of a database project with continuous integration using Hudson, Subversion and MSBuild (I know I promised using NAnt, but trust me – in our case it just would have been a bit more painful to go over. Helpful NAnt tutorial).

Summary
This was a very basic, barebones CI setup. Once you start delving into configurations, parameters, different environments it gets more complex and interesting. The database project is the core component of all of this CI setup process. The rest of the pieces like the CI server, source control, build tools and issue tracking systems are customizable. Depending on the environment and other influential factors you may decide to go with one product set versus another. This was just one way of getting started on CI on a database project with Visual Studio. There are numerous CI server products available as open source. A comparison is provided
here. Hope this helped. Thank you and keep reading.

Managing Database Code for Continuous Integration (Part 2 of 3)

Introduction

URL to Part 1: Agile Data warehouse Planning & Implementation with Hudson, MSBuild, Subversion and Visual Studio Database Projects (Part 1 of 4)

Part 2 (of 4 series) on Continuous Database Integration covers creating, managing and provisioning a database project for a continuous integration environment. I will be using Visual Studio 2010 to create a database project (.dbproj project) and for source control I will be using Subversion (open source). The usage of Visual Studio is mainly for the management of SQL scripts and (most importantly) for the deployment file that it generates, but you can do away with it and have a complete CDBI system setup in an open source environment. Most of the tools selected in this article to set up CI are free.
If your environment extensively uses Team Foundation Server (instead of Subversion), and Application Lifecycle Management (ALM), you should look into setting up continuous integration with these tools before starting to look for free ones. If you are an open source shop (two thumbs up) or want to try out standing a CI environment on your own, read on.

This article is focused on the use of the SQL Server database project type in Visual Studio 2010. The SQL Server project type has ‘build’ and ‘deploy’ capabilities (along with a bunch of other features), which is the pivotal component of the CI setup. If you do not want to use VS 2010 Database Projects, and instead use a file system management based structure of hosting the database scripts; then, you need to manually create the re-runnable\deployable script (an OSQL or SQLCMD command script that executes all your deployable .sql scripts). Make sure you test this against a local instance of your database to emulate the build and deploy features. I created a custom C# console application that looks into specific folders (Tables, Stored Procedures etc.) and creates a command line batch script with error handling embedded in it (a custom deploy file generator was created as my project involved both oracle and sql server and I was working with Visual Studio 2005 .dbp project type), but with VS 2010 you get all these benefits plus the database testing, refactoring and it just plainly makes managing a database project much simpler (.dbproj project type).

Difference between .dbp and .dbproj: http://blogs.msdn.com/b/gertd/archive/2009/03/25/dbproj-vs-dbp.aspx

Before we begin on the CI setup of the database project, download the following tools to set up the environment:
1.    Subversion – For source control of the database project. This is by far the best and most comfortable source control system I have worked with (Sorry TFS). After installing subversion and creating a repository, make a note of the repository location as you will need it to link your database project to that.

2.    Plugin for Subversion integration with Visual studio IDE (either one of the two)
a. Visual SVN (Free to try, $49 per license)
b. Ankh SVN (Free)

Visual SVN vs. Ankh: I would recommend Visual SVN to a database centric shop that has heavy duty database development, SSIS, Reporting and\or SSAS solutions. I have had problems with Ankh SVN plugin to work correctly with these project types. It does not recognize these project types from the IDE and you end up managing them from the windows explorer instead of the commit\revert operations from the IDE. Visual SVN is much simpler to use, works perfectly with all types of project types that a database developer needs to work with. Yes, it does come with a price tag, but a license of $49 is dirt cheap. This was when I was working with Visual Studio 2005 integration. Things may have changed with Ankh since then, try it out and see what works best for your scenario.

3.    SQL Server SSMS Tools Pack: This is more of a helper plugin than a requirement. Helps you generate your seed data, save custom snippets as hotkeys, CRUD generator and more. Once you start using it you will want to get more. Download it here.

Database Project Setup
Once the prerequisite software is installed, open Visual Studio 2010 and create a new SQL Server 2008 project.

clip_image001[6]
Database Project Type

For demo purposes I am creating (reverse engineering by importing an existing database during project setup wizard) a database project for AdventureWorks SQL Server 2008 database [If AdventureWorks sample database is not installed on your database server, it can be downloaded from CodePlex]. Complete the project setup by following the necessary steps as per your database configuration. Leaving them in their default settings is also fine for the moment.

Once the project setup wizard completes, the solution explorer should resemble the fig. below. Right click on the project name “AdventureWorks” and select Properties to bring up the project settings. Click on the ‘Build’ option to view the location of the deployment script. Select the ‘Deploy’ tab on the left to view the deployment settings.

clip_image002[6]
AdventureWorks Deployment Options

Now that we have the project ready, Right click the project and select ‘Build’. The status bar should go from ‘Build Started’ to ‘Build Succeeded’ status. After the Build succeeds, deploy the project by Right clicking the project and selecting ‘Deploy’. This will create a deployment script named ‘AdventureWorks.sql’ (in the Visual Studio\Projects\YourProjectFolder \sql\debug). With the database project deployment, two options are available (for now leave it in its default state).
1.    Create a deployment script (default)
2.    Create a deployment script and run it against a database.

clip_image003[6]
Location of AdventureWorks.sql deployment script

The deployment script is the most important artifact for a successful CI system setup. This is a compilation of all the database objects belonging to your database project (including seed scripts, security etc.). After first time deployments, when a change is made to the database project, a script with the same name will be generated which will include the changes.
The next step is to add your project to Subversion source control. To version control your project, right click on the project and select ‘Add solution to subversion’. Select the repository path and add the project. And finally right click and Add files and then commit\check-in the solution. The project is now ready to be shared by anyone who has the correct setup as listed earlier.

Preparing artifacts for CI
An isolated database instance of SQL Server needs to be provisioned for continuous build and deploy of database scripts (tear-down and reinstall). This database should not accessible to developers to use for development or testing purposes. The sole reason for the existence of this database is to test the continuous deployment of a database on either code commits or regular intervals of time. This also serves as a sanity check of your end product at any point in time.

Visual Studio 2010 database project provides the tear-down and install script (tear-down = recreate database) on the right click and select deploy action. But, in a continuous integration environment we would like to have this file created automatically on every build scenario. This can be implemented using the VSDBCMD command. You could use VSDBCMD for just creating the re-runnable deployable script (with the /dd- command line option) or use it for creating and running the deployable script (with /dd+ option at command line). If you plan to use it just for creating the re-runnable deployable script, then the script needs to be executed by either ‘sqlcmd’ or ‘OSQL’ in any environment separately.

Ideally, I would prefer using VSDBCMD just to create the deployment script and then handover the deployment script to the DBA specifying the parameters (documenting them in an implementation plan of the database). The DBAs are familiar with sqlcmd\OSQL than VSDBCMD, plus using the VSDBCMD to execute the deployment script requires a bunch of assemblies (dll files) to be copied on the database server. I am not sure as to how the production DBA of today will accept this change. Thinking on the likes of a developer; sure, VSDBCMD is cool and you should definitely use in qa and production environments. But, in the real world scenario DBAs run the show. By just creating the deployable file in development and then running the same on QA and Production using sqlcmd you standardize your deployments and make your deployments simpler and worry free. (Did I mention that VSDBCMD also requires a registry change on the machine if Visual Studio is not installed on the machine, which is the database server?).
Not always a smooth ride; enter the obstacle: the hardcoded variables in the deployment file.

Hardcoded variables in Visual Studio deployment file: Visual Studio deploy process creates three default parameters in the deployment file: DatabaseName, DefaultDataPath and DefaultLogPath. The ability to edit\override them is what makes the discussion of SqlCmd vs VSDBCMD interesting.
The main advantage with SqlCmd over VSDBCMD is the ability to pass variables as parameters to the deployment script from the command line. This is a big advantage as the VS DB project hardcodes the database name, data and log file path (mdf and ldf) in the deployment script (AdventureWorks.sql, see setvar commands below) and although there is a way to get around it, it is painful.

:setvar DatabaseName “AdventureWorks”
:setvar DefaultDataPath “C:\Program Files\…\DATA\”
:setvar DefaultLogPath “C:\Program Files\…\DATA\”

Note: The above variables can be suppressed by editing the project deployment configurations. (This option can be used at runtime via command line params also).

clip_image004[7]

At this point you have two options with sqlcmd: manually changing the DatabaseName and DefaultDataPath and DefaultLogPath variables in the deployment file, or use option two i.e. changing the variables on command line with ‘SqlCmd’ using the “-v” flag for variables.

Ex: sqlcmd –S -d master -v DatabaseName=“NewAdventureWorks” DefaultDataPath=“C:\Data\” DefaultLogPath=“C:\Data\”

If you decide to go with VSDBCMD for creating the deployment file and deploying to the database server, a workaround is required to make this work. Complete the following workaround steps (skip both steps if you are going to go with sqlcmd for qa & production deployments):
1.    Override the DatabaseName at runtime with the TargetDatabase command line option. Ex: /p:TargetDatabase=”NewAdventureWorks”. This will override the :setvar DatabaseName “AdventureWorks” to “NewAdventureWorks”.
2.    Overriding file path variables: Let’s get something straight first – ‘The variables DefaultDataPath and DefaultLogPath cannot be overwritten’. Microsoft has received requests for this and is planning to allow for overwriting in the next release of database projects. For now we will have to do with a workaround.
a. Right click on project ‘AdventureWorks’ and select ‘Deploy’. Edit the Sql command variables file by clicking the Edit button.

clip_image005[6]

b. Add two additional variables ‘myDataPath’ and ‘myLogPath’ as shown below.

clip_image006[6]

c.    In the database project, navigate to Schema Objects \Storage\Files and change the data file path variable in the AdventureWorks_Data.sqlfile.sql and the log file path variable in AdventureWorks_Log.sqlfile.sql to reference the newly created command variables.
–    Rename $(DefaultDataPath) to $(myDataPath)
–    Rename $(DefaultLogPath) to $(myLogPath)

clip_image007[6]

d.    Right click and Build the project. Navigate to .\AdventureWorks\sql\debug (location of your project) and open the AdventureWorks_Database.sqlcmdvars with Notepad. The new variables will be available to change in here.

clip_image008[6]

As you can observe from steps 1 & 2 above, the workaround for using VSDBCMD can be a bit painful. One other important thing to keep in mind is that VSDBCMD does not execute pre-prepared deployment files. This is also an item that the MS team is considering to change in the next iteration. To create a deployment package VSDBCMD needs the necessary assemblies, build files, manifest, sqlcommandvars file etc. to prepare the end product (deployment file) and run it. On the other hand sqlcmd is easier to run pre-prepared deployment files (like AdventureWorks.sql).

Creating the build package (build files) & executing the deployment output
For now, I am going to demonstrate creating the deployable file with VSDBCMD (minus steps 1 & 2 above) and deploying them on different environments with sqlcmd instead of using VSDBCMD.
The workflow of the continuous builds and deployments that we are trying to emulate is:

a. Clear existing deployable file: In the \sql\debug folder, delete the file AdventureWorks.sql.

b. Build the project: For now just right click and select “Build”. I will be using MSBuild to perform this task in the next article. Behind the scenes, when you right click and build, Visual Studio uses MSBuild for the build process to create the build files in \sql\debug folder.

c. Generate deployable file AdventureWorks.sql: Using the VSDBCMD command line tool on the integration machine and the database manifest file (AdventureWorks.deploymanifest) to generate the deployment file “AdventureWorks.sql”

Before trying out this step, make sure VSDBCMD is installed on your integration machine.
–    If Visual Studio is not already installed, then follow the instructions here to download and install VSDBCMD.
–    If Visual Studio is already installed (VSDBCMD is located in “C:\Program Files\Microsoft Visual Studio 10.0\VSTSDB\Deploy”), make a reference to it by adding it to your PATH variable in environment variables (instead of copying the files).

Execute the following command in the \debug\sql folder from command prompt:

VSDBCMD /dd:- /a:Deploy /manifest: AdventureWorks.deploymanifest
The /dd- ensures that a deployment script ‘AdventureWorks.sql’ is generated and is not executed against the database server.

d.    The output of step above (AdventureWorks.sql) is executed against the integration database instance. Execute AdventureWorks.sql using ‘sqlcmd’ to test out the deployment.

Sqlcmd –S -d -E –i -o -v DatabaseName=“” DefaultDataPath=“” DefaultLogPath=“”

Ex: SqlCmd –S myServer -d master -E –i C:\AdventureWorks.sql -o C:\LogOutput.txt -v DatabaseName=“NewAdventureWorks” DefaultDataPath=“C:\Data\” DefaultLogPath=“C:\Data\”

Note: Variables set with –v parameters overwrite hardcoded variables set in the deployment script.

Summary

This completes our preparation of build items needed for setting up the database project for Continuous Integration. Not to worry, the steps above are for understanding the working knowledge of how the CI product (Hudson) is going to orchestrate the above steps on the server for us in the next part of this series. As far as the options go with selecting VSDBCMD for build or using it for both build and deploy, it depends on your environment. If you are flexible enough knowing the changes you have to accommodate to get VSDBCMD working in your environment, then go for it, otherwise just use it for build purposes on the build server to create the deployable product and use that going into the next environment phases (QA and Production).
The next part of the series deals with orchestrating the steps a through d above in a repetitive manner based on either code commit\check-in or regular intervals of time. This will be set up using free tools – Hudson and NAnt\MSBuild. The next article will explain in detail setting up Hudson as a CI server for Database projects and configuring MSBuild\Nant tasks for the actual implementation. I will also take some time to discuss a proactive vs. a reactive CI setup and how that affects development. That’s all I have for now. Thanks for reading.

Agile Data warehouse Planning & Implementation with Hudson, MSBuild, Subversion and Visual Studio Database Projects (Part 1 of 3)

Overview

The notion of managing data warehouse projects with continuous integration with open source technologies is an uncommon practice or i guess is just unpopular in IT shops dealing with database code, SSIS and SSAS projects (from my experience). Excuses\opinions differed from company to company:

• “It doesn’t apply to database code projects”
• “What is Continuous Integration and how does it apply to data warehouse projects?”
• “Here at Acme Inc. change control is done by our architect\DBA who uses a tool called ‘AcmeErwin’ or AcmeVisio to generate code, so we don’t need the additional bells and whistles”
• “Automating testing & deployment for database projects, SSIS packages is not possible”
• “Is it worth the effort?”
• “We are special, we do things differently & we don’t like you.” – kidding about this one.

In this article I will try to justify the use of CI on data warehouse projects and try to address the concerns above. The subject matter of this article is geared towards planning and implementing data warehouse projects with agile development practices on the lines of iterative feature\perspective driven development. (Perspective = Subject Area = Star Schema) The article begins with an introduction to agile development practices, reviewing evolutionary database design, defining continuous integration in the context of database development, comparing viewpoints of waterfall and JAD methodologies to Agile, and demonstrating the coupling of the Kimball approach with Agile to establish a framework of planning long term project milestones comprised of short term visible deliverables for a data mart\warehouse project. I will do a detailed walkthrough of setting up a sample database project with the technologies (VS Database Projects for managing code, Hudson for Continuous database integration, MSBuidld for configuring builds, subversion for source control and OSQL for executing command line SQL) is included for demonstration .

Due to the verbose nature of my take on this, I am writing this article as a 3 part series. Trust me; the next 3 parts are going to be hands on cool stuff.
Part 1 – An introduction to agile data warehouse planning & development and an introduction to Continuous Database Integration (CDBI).
Part 2 – Create the database project with Visual Studio Database Projects & Subversion.
Part 3 – Prepare the build machine\environment with Hudson and MSBuild

Introduction
After shifting gears on different approaches to database development at various client sites and compiling the lessons learnt, I am close to applying a standardized methodology for the database development and management. One can apply this approach on projects regardless of size and complexity owing to its proven success.

Before starting the introduction to Continuous Integration and Agile, let me take a step back and give you a lesson learnt working with the waterfall model. While working on a new data warehouse project and adopting the waterfall SDLC approach for database project planning and implementation, over time, the implementation plan did not follow the estimated planning. Sure, it was only an estimate, but you don’t want these estimates changing forever. Here the implementation was almost always off track when compared to the initial plan. This observation was initially not visible during the initial planning phase, but over time the mismatch was more evident during the development cycle. The mismatch was due to ‘change’. These were changes in requirements or caused due external factors. When you approach a data warehouse development methodology it is either the Kimball approach or the Inmon approach; and for my project it started off with the waterfall + Kimball. But, due to the nature of the requirements from the business where changes were too frequent, the waterfall was proving to be a showstopper. The reaction to changes and turnaround time required by the development team was slowing down the project timeline. The requirements were changing plus this project was already a mammoth effort with more than ten subject areas with conformed dimensions to form a data mart.

The old school approach on starting a new database project (with the waterfall cycle) begins with initial requirements and then comes in the logical model and then the physical model. Usually In this approach you have an ‘architect’ or a ‘data modeler’ or an application DBA on the project who owns the schema and is responsible for making changes from inception to maturity. This methodology is almost perfect and everyone starts posting those ER diagrams on their walls and showing off, until Wham! Requirements start changing and for every change you need to go back, change the specification, the schema and then the actual code behind, and of course the time for testing. As the frequency of these changes goes up, this catapults the delivery dates and changes your project plan. In this approach, the turnaround time for delivering an end product with the changes identified is just not feasible. I am not debating that using a waterfall approach will determine success or failure; I am trying to juxtapose the effort involved (basically showing you the time spent in the spiral turnover of the waterfall model then will compare it with an agile approach). This is a classic example of how the traditional waterfall approach hinders the planning and implementation of your project.

This called for a need for a change in the development approach, one that could react quickly to any change that could affect the time-line of deliverables. The new approach adopted was an agile development practice + the Kimball method which resulted in a successful implantation of a large scale data mart for a health care company. By now you should have an idea on what I am trying to sell here.

What is Continuous Integration?
Continuous Integration is all about automating the activities involved in releasing a feature\component of software and be able to simulate the process in a repeatable manner to reduce manual intervention and thus improving quality of the product being built. This set of repeatable steps typically involves running builds (compiling source code), unit testing, integration testing, static code analysis, deploying code, analyzing code metrics (quality of code, frequency of errors) etc.
Continuous Integration for database development is the ability to build a database project (a set of files that make up your database) in a repeatable manner such that the repeatable action mimics the deployment of your database code. Depending on the database development structure database projects can successfully be set up to run scheduled builds, automate unit testing, start jobs and deploy code to different environments or stage it for deployment to reduce human error that a continuous and monotonous process can bring.

The CDBI (Continuous Database Integration) Environment
The best part about the tools I am going to set up the continuous integration environment is that they are all FREE. Almost all are free except Visual Studio Team Edition for Database Professionals. I would highly recommend using VS Team Edition for Database Professionals as a tool for developing and managing code when working with database projects. The others tools that I used for continuous integration is Hudson, Subversion and NANT. Yes, freeware used mostly by the othe other community. But after applying them side by side with MS technologies, that proved to be a good mix.

All of the above opinions when drilled down, point to the concept of ‘done’ or to ‘a deliverable’. The granularity of the deliverable is pivotal to incremental software development. It is just a matter of perspective – for incremental software development the granularity of your deliverable is much smaller and the visibility is much clear and concise when compared to a deliverable on the waterfall track. On the continuous sprints you know for sure what needs to be delivered by the next sprint\iteration. At this point you have a definition of ‘done’. This is the most important thing when we start getting into agile development practices – The concept of done. [TechEd Thanks]
The more tasks are granular, easier they become to control and complete. Once you start slacking on a few, they tend to pile up and when that happens on a larger scale, it fogs the plan even worse. Once you step in to the shoes of a project planner and also of a lead, this gap will become more evident and clear.

This is where CI helps in meeting deadlines, showing progress of work in regular sprints where the previous sprint progress is evaluated (to validate the concept of done) and requirement for the next sprint is defined.

Summary
To summarize, Continuous Integration in a DB environment is all about developing your database code in sprints (of two weeks or more, your choice), by a feature or perspective. Ex: A feature in the AdventureWorks database would be HR module or the Sales module. An example in a data warehouse environment could be the Inventory star schema. It is these short sprints (regular intervals of feature completion or deliverable, usually 2 weeks) of clear quantifiable requirements (definition of ‘done’) that helps gauge the status of work. Once the developers adapt to this rapid SDLC, visibility into the progress of the work goes up, results in accountability and ownership of work, building a more cohesive team and increased productivity (I can bet on this one) and most important of all, a quality product being delivered in chunks to form the big picture. The big picture being a collection of perspectives(start schemas) that plug together to form a data warehouse. That is all I have for now, more to follow in my part 2 and 3 on setting up the CI environment with a database project, using Subversion as source control system and MSBuild\NANT for creating build files.

Thanks for reading & stay tuned ….

Deploying SSIS Packages with XML Configurations

SSIS trivia: What do admins detest most about SSIS deployment?
Answer: The Environment Variable configuration.
One the features included by Microsoft for flexibility & ease of deployment between environments. If your answer was registry files, you were close. Environment Variables is on top (Registry Files comes in second).

Overview: This article is all about using XML configurations in SSIS, direct vs. indirect configurations, pros and cons and explaining the deployment issues and concerns with these types of configurations. For the purpose of this article, when relating to database connection strings I am assuming that AD authentication is used all over. SQL authentication is out of scope, for now (otherwise I will start a geek riot).

XML Configurations: XML configurations in SSIS can be used in one of two ways
1. Direct Configuration: This configuration setting takes the form of a hardcoded path in the package itself and changing this between different environments when moving packages to QA or Production; it is the responsibility of the deployer to rely on the manifest (explained below) or changing the dtsx manually. When this type of configuration is used, it becomes a mandate that the path of the XML Configuration file is going to remain the same for all developers (on their machines) working on the project.
For example the path of the XML config is C:\myXMLConfig.dtsconfig for say 10 packages in a solution. Now, when another developer joins the merry band, the new developer has to ensure that the configuration file used is in the same path set by the previous developer. If the new developer places his configuration file in a different location, say something like “D:\myXMLConfig.dtsconfig”, then the previous developer’s development environment will not load correctly (Assuming both developers are working on gold code and checking out and committing to the same source repository).


Fig 1: Direct XML Configuration

The SSIS Manifest:
The SSIS Manifest file is a setup file, similar to the Installers that you run for software products which on the click of the next-next-finish complete your software installation. The manifest file when opened launches a wizard where you can specify the new location for the configuration and change configurations and also the new location of the packages when deploying to different environments. From what I understood from the working of the manifest; it basically does two important tasks (please comment if I missed any):

a. Copying the SSIS packages between environments (Same as using XCOPY or DTUtil. Bet DTUtil uses XCOPY behind the scenes)

b. Change the XML configuration file path (a simple Find-Replace based on the XML node find on Configuration Type =5 or 1. Open the SSIS package in a text editor to view the SSIS package XML code for configuration types)

i. ConfigurationType =5: Indirect Configuration (described below)
ii. ConfigurationType =1: Direct (hard-coded) XML file path configuration.

2. Indirect configuration: Indirect configurations in SSIS packages allow you to reference a configuration file to a virtual name i.e. to an environment variable. This means that the SSIS configuration value that is embedded in the package(s) is an environment variable (key\name), the value of which is the actual path of the XML configuration file. When this kind of a setting is used for configurations, during deployment time, only the environment variable need to be added to the environments where the packages are deployed.


Fig 2.1: Indirect XML Configuration


Fig 2.2: Environment variable configuration

There are issues where implementing this type of configuration setting in some environments. Two main issues:

a. Machine reboot on environment variable add\modify: When an environment variable is added, in order for it to take effect, i.e. in order for the packages to start recognizing it, the machine needs to be restarted. This is not true. Same holds true when modifying the environment variable. There have been numerous questions and concerns on this topic and since an obvious panic button is hit with the word ‘reboot’, people tend to stay away from this setting. The reality check is that the process which is running the SSIS packages needs a restart. Typically this is the SQL agent, which is the scheduler for the packages. In a development environment the BIDS needs a restart. This is one of the most important concerns a DBA has with SSIS configuration.

b. Development & QA on same server: This issue crops up when DBAs provision development and QA instances of SQL Server on the same machine. In such environments it becomes impossible to implement indirect configuration settings with environment variables as one environment variable cannot be used for both environments development and QA.

Other than the obstacle (b) above, the environment variable coupled with XML configurations is the simplest and most flexible way of deploying SSIS packages between environments (again, going back to the assumption of AD authentication). I have gone down the path of trying to think of different ways of convincing the admins for implementing environment variables at various companies; in some cases I was able to sell my pain, in other cases had to walk away with explanations on the ‘complexity’ of ‘maintaining’ environment variables and it was back to the drawing board to find another solution.

As a developer, I may not be totally in tune with the ‘complexity’, ‘security breaches’ and the ‘challenges’ (stress on quotes) involved in maintaining environment variables, but really?, after all the security ‘gizmos’ an organization has in place to stop Intrusions, inc. one would really ponder on the need for an overdose of security, but, such is life … and I don’t blame the admins for doing their job (maybe I would have done the same thing if I were a production DBA). DBAs would want to minimize the loopholes, build better security practices and keep the servers clean of environment variables and registry files; and on the other hand a smart developer wants to incorporate similar best practices, coding standards, ease of deployments between environments, etc. and truly it is difficult to see green on both sides of the fence.

In the domain of deployment of packages between environments; the MS Integration Services team did a good job making the XML configurations, database configurations etc. available to us in SSIS, using which the migration of code between development, QA and production has become simple and efficient. Along those lines, they have given us the (risky) capability of saving sensitive information as clear text in XML configurations. They do have a neat way of letting us save the sensitive information and we take the risk of saving such information, i.e. the packages themselves strip the passwords and anything marked sensitive. This goes back to deployment modes msdb and file system which I will not delve into.

If one would think of SSIS configurations as a bane; try implementing continuous integration and deployment of SSIS projects\packages without configurations, and trust me; if you have not already, you will feel the pain (been there, felt that).

Summary:
At the end it comes to selling your pain in that meeting room with quasi-knowledgeable management, the so called technical managers and DBAs over security &maintainability vs. flexibility & ease of implementation. You know which scale weighs heavier, don’t you? It’s the DBA side of the scale, i.e. the security part of it. But, you don’t see me frowning, coz someone once told me “A problem\setback is an opportunity in disguise” (or something like that). In my next article I will demonstrate how I really did find opportunity in this obstacle (hint: has something to do with continuous integration with SSIS & C# and the next paragraph).

Personally, i like the database configurations and in my implementations of SSIS projects. I give it a mix of XML configuration (The road less travelled: indirect in development environment, convert it to direct in QA & Production.) and database configuration elements to it. This keeps the deployers\admins at bay and gives me more power as a developer to enforce configurations at the database level for my application.
More about the road less travelled in my next article.
Sometimes I wonder if I should start another blog – SqlDeveloperRants.wordpress.com.
  

Thanks,

Vishal Gamji

MCITP – Database Developer

admin@expertsqltraining.com