The Extract, Transform & Load (ETL) process typically accounts for 70% of an entire data warehousing/data migration effort (according to The Data Warehouse ETL Toolkit). And this proportion represents a major element of the project in terms of risk: the implementation could make or break the project. This fact should be taken into consideration while designing and implementing an ETL solution.
Every staging project has two elements: the ETL itself and a staging data structure. So today I’d like to talk about best practices for standing up a staging area using SQL Server Integration Services [ETL] and hosting a staging database in SQL Server 2012 [DB]. The ‘best practices’ are across three areas: Architecture, Development, and Implementation & Maintenance of the solution. Each area represents key patterns and practices (not a comprehensive list) for the ETL component and the data structure of the staging database.
In this post I am trying to solve a real world analytical problem using Microsoft PowerPivot. Before I get started, let me introduce a few terms here:
Long Term Plan: A customized package of R.H.Smith business school courses spread across five campuses offered in various combinations by area of expertise under specific departments.
Students at the Smith School of Business use the Long Term Schedule Planner to plan their long term schedule based on the courses that will be offered, the number of credits, the campus location, term, etc. and then select and register for those courses. And I am sure that there are many students out there who return to this page often to to see what other options are available and if there were any changes to the long term schedule.
The current Long Term Schedule Planning page delivers its purpose only to an extent; it drives the user\student to pick courses based on a specific option set i.e. Term –> Campus –> Department. Only then can you look at the courses and proceed further. Personally I am not a big fan of analytical processes forcing me to think in one direction. It should be, as I see it, a flexible application where students can use any ‘perspective’ to look at their schedule, have a way of planning and saving their schedule, have the ability to drill down and apply filters by department or for that matter just a course and see which semesters is it offered and plan accordingly.
‘a’ Solution: Microsoft PowerPivot
So, as I was trying to build my customized long term plan and got tired of going back and forgetting where I started; I got an idea to get all the data required for long term planning into a Microsoft SQL Server 2012 database and create a PowerPivot model. For me not only was this was the simplest and the easiest way of analyzing data but also was a powerful way of visualizing the data. The solution was implemented in the cloud. I am making a copy of the Excel workbook with the data and making it available for download.
For those who are new to PowerPivot, “PowerPivot is a powerful data mashup and data exploration tool based on xVelocity in-memory technologies providing unmatched analytical performance to process billions of rows at the speed of thought” [MS PowerPivot]. It is available for free. Works as an add-in in MS Excel.
To use the workbook the software required is:
1. Microsoft Excel 2010 (have not tried it with 07)
2. Microsoft PowerPivot (Free Download)
Screenshots of the developed PowerPivot worksheet
How to get the PowerPivot workbook ?
Download it here. (Make sure that the above mentioned software is installed before using this workbook).
Note: This is the first version of the workbook and may need some fine tuning. Please let me know how you want to improve this and I will make it happen. Any feedback is appreciated.
This was a very simple example of a “self service” BI (Business Intelligence) application. I would highly recommend trying out PowerPivot for your advanced analysis tasks at work. You will not be disappointed with this small but effective BI move. As a matter of fact, Microsoft BI has already made its way into many companies as the leading BI player in the industry. [Gartner Report]
If you are interested in learning more about Microsoft BI products or have a BI effort that you would like to solve or for that matter have any potential BI solutions that you think will give your company the strategic edge and competitive advantage, please do not hesitate to contact me.
AIS is a privately held, technical services company headquartered in Reston, Virginia. Over the past 30 years, AIS has provided custom solutions to government agencies and businesses worldwide. We’ve built a reputation for delivering completed projects to our clients that are on time, within budget and exceed expectations. Today, AIS provides long-term career opportunities to more than 300 talented individuals and open additional offices in Maryland, North Carolina, Ohio, Texas and India.
Thanks for reading,
Lets cut to the chase and talk about how one can deploy packages or enable CI with SSIS 2012 projects. This short blog is focused on deploying SSIS ‘projects’ as a whole rather than using DTUTIL, XCOPY etc.
Creating ‘tear down and rebuild-able’ artifacts
We all know that SSIS packages are nothing but mere XML files, right?. Technically yes, but they are actually way much more than that. But, since they don’t have properties of compiling as normal code, we need alternate ways of deploying these files or projects as a whole. The most obvious method would be XCOPY or DTUTIL. With the ‘project’ based deployment of SSIS we can do better. Better how? Deploy the project instead of individual files. When the project based deployment is selected; the build action on the project creates an “.ispac” file which can be used as an msi installer to deploy packages on other environments. But if this action of deployment needed to be automated or enabled for CI we need a way around the UI based aspect of the ispac file (deployment file).
I am assuming the reader is using project based deployment and is familiar with the ispac (integration services deployment file) file used for deployment with SSIS 2012. If not, please read this article http://sqlblog.com/blogs/jamie_thomson/archive/2010/11/10/introduction-to-ssis-projects-in-denali.aspx
The readiness steps needed to run this “.ispac” from the command line are
1. Recreating the integration services catalog: When projects get deployed using the UI based approach with the ispac file, there is an option to deploy the project to a specific integration services catalog (in a SQL Server 2012 instance) and one also gets the option to overwrite it. This needs to be in a script runnable in sqlcmd or powershell.
IF EXISTS(SELECT 1 FROM SSISDB.internal.folders WHERE name = ‘CatalogName’)
IF EXISTS(SELECT 1
WHERE folder_id = (SELECT TOP 1 folder_id
WHERE name = ‘CatalogName’))
Declare @folder_id bigint
2. Deploying the project from the command line: The next step is to prepare the command line executable of the deployment file (ispac). The ispac deployment file can be evoked from the command line using IsDeploymentWizard.exe with the following switches.
Ex: IsDeploymentWizard /S /SP:"C:\SSIS.ispac" /DS:"<myServerName>\<Instancename>"
As simple as that. Just have this two steps incorporated in your CI build steps or a batch file and you are all set. If you are looking for more information on IsDeploymentWizard, MSDN is a good resource. If you encounter any issues do let me know. More than happy to help.
Trying very hard to keep a balance between business school learning and timely tech updates. There is a lot of catching up to do ranging from MS SQL Server 2012 BI visualization capabilities like power view to the detail of using column store indexes. I did get an opportunity to work with Integration Services and certainly is a game changer in the area of ETL tools out there. I find it encouraging to see large organizations pursuing the use of SQL Server Integration Services not only for data warehousing related workloads, but also for day to day operational activities (tasks\processes).
Technical managers and decision makers should start thinking of a common platform for data movement across the enterprise; not just for governance reasons but more for the ability to trace and remove process redundancies from the organization. Need a place to start? Well, start by making an inventory of your processes. If you already have a project\process portfolio management tool use it to your advantage. Have a parallel development effort started on designing and implementing a common framework for maintaining these processes. This should have the ability to audit events, capture exceptions and errors, log process updates etc. More like a monitoring tool for all processes. Start simple. What do you want to know – “When did my process start\end?”, “Did it fail\why\Where do i see the error?”, “How many rows\files were transferred in the process”, “How do i get notified in the event of a failure?”. Let these types of questions drive your requirement. I would recommend to keep the implementation of the process framework as technology agnostic as possible as i have made this mistake by creating something close to the technology (script tasks for logging, auditing and event handling) and ‘when’ the technology upgrades, you are screwed. Just more work to redefine the framework.
Let me try to get a conceptual example with code as examples to use. Going back to the drawing board on trying to get something in place as a ‘common’ process hub, will be back soon. I hope …. with Managerial Accounting looming closer and sending shivers down my spine.
If you are working on BI projects and use Microsoft Business Intelligence Development Studio 2008 (BIDS\Visual Studio 2008) for SSIS, SSRS and SSAS and one fine day your application development team has decided to move on to Visual Studio 2010 and Team Foundation Server 2012 you might think initially that there was either a BIDS 2010 version out there that was released with Visual Studio 2010. If you are a SQL junkie you know better that you have to wait until SQL Server 2012. To be clear – You *can* host your current BIDS 2008 projects in TFS 2012, but unless you install the following components you will not be able to use Team Explorer for your BIDS projects, nor can source control actions (check-ins, check-outs etc.) be performed from the IDE. Without the additional components a BI developer is stuck to developing in BIDS 2008 and then make sure (as a separate task, post development) to regularly integrate code artifacts with TFS. Sure does sound painful, but lets make it easy on us.
The fix consists of installing Visual Studio 2008 Team explorer and the forward compatibility update for connecting to Team Foundation Server 2010 from Visual Studio 2008.
1. Close any open instances of Visual Studio 2008 and Visual Studio 2010 before installation.
2. Download and Install VS 2008 Team Explorer using the following URL.
3. Install VS 2008 Service Pack 1: Download and install the following service pack
4. Install the Visual Studio Team System 2008 Service Pack 1 Forward Compatibility Update for Team Foundation Server 2010 (VS90SP1-KB974558-x86.exe)
5. Open Business Intelligence Development Studio (Start\Programs\Microsoft SQL Server 2008 R2\ SQL Server Business Intelligence Development Studio)
- Select “Tools” –> “Connect to Team Foundation Server”
- Click “Servers” –> “Add” to connect to TFS 2012.
Note: Open the Visual Studio 2008\BIDS 2008 project from a Visual Studio 2008 solution *only*. If you open Visual Studio 2010 (using Team Explorer) and try to open the VS 2008\BIDS project from that location it will corrupt your BIDS solution as VS 2010 will try to convert it\upgrade your 2008 solution to 2010. Make sure all the developers are aware of this and they don’t inadvertently upgrade the solution.
If all steps above finish smoothly, you are all set. Just make sure of the note above.
It certainly would have been nice if an upgrade to BIDS 2008 was provided when the VS 2010 release came along. All these tweaks would be avoided and the solutions could be managed cleanly. I take this as a small setback and not a major one. Because, in an agile environment where all the projects are set up with CI build and deploys, one could successfully manage VS 2010 projects (ex: Database Projects) and VS 2008 projects (ex: SSIS, SSRS projects) in tandem. The build configuration for BIDS 2008 projects will have to be performed as a custom build step as this would be another place where the build process in TFS 2012 may not support the older versions on Visual Studio. But still both projects versions are deployed under one build workflow. Hope this walkthrough was helpful.
This post is a continuation to the series on implementing a continuous integration environment for Visual Studio 2010 database projects (.dbproj) using Hudson, MSBuild and Subversion. Links to the previous two posts:
Part 1 was an overview of agile development practices for database\data warehouse development. Part 2 explained in detail on setting up the database project, adding code to change control (Subversion) and preparing artifacts for CI. Part 3 will make use of the deployable database script (sql) created as part of the Visual Studio deployment process (from part two) for setting up the CI environment for the database project.
Hudson can be used with NAnt or MSBuild to build and deploy database projects. I will perform a walkthrough of setting up the AdventureWorks database project (created in Part 2) on Hudson using NAnt\MSBuild and Subversion.
Hudson is a free continuous integration server mostly used by the dark side (a.k.a. the java community). I have worked with CruiseControl.Net and Bamboo but my personal favorite is Hudson for its management features, ability to extend plugins and a cool UI. Hudson is available for download here. (Java is a prerequisite to install Hudson).
Install Hudson by running the command line statement (from the directory where Hudson.war was downloaded to):
java -jar hudson.war
By default Hudson is installed on Port 8080. If this port is taken by some other service, use the ‘-httpPort’ switch to change to port on which Hudson runs.
Ex: java -jar hudson.war –httpPort 7777
Hudson can also be configured to run as a windows service. The documentation already does a good job of explaining the steps for configuring hudson to run as a windows service so I will just redirect you to the right place.
Next, find and add MSBuild and Nant plugins from Manage Hudson –> Manage Plugins. The ‘Install’ button is at the very end of the page.
Create a project for the build process
Navigate to http://localhost:8080 (or to the port on which you have hudson running on). Start by creating a freestyle software project for our AdventureWorks sample. In this demo we will be creating a development hourly build of the AdventureWorks DB project. Click OK once done (ignore all the settings on next page, just hit save for now).
Adding the build step & Polling source control for auto updates
Hudson works with build files to execute against different targets, more like a service broker that works with different services to run different kinds of jobs. Targets can be a set of instructions grouped together under a common task umbrella. For example a ‘debug’ target for building the project in debug mode with configurations that are specific to development environments and having a ‘Deploy’ target for running the distributable product.
You could use either MSBuild or Nant to build and deploy the project.
Difference between using MSBuild & NAnt with Database Projects: With MSBuild the database project file AdventureWorks.dbproj works as the build file and one can make changes or customize it in the VS IDE itself which is really helpful. With NAnt and DBProjects, one has to create a build file (.build) manually or use a free NAnt build file editor; whichever the case you would have to include the MSBuild task in the Nant build filee to build and deploy the database project. (Tasks are specific XML tags that are passed as instructions for a particular unit of work in a build file). The reason MSBuild is preferred for a database project .dbproj vs. a dbp project type lies in the way the deployable database file is created (see previous blog post) .
For example, if a new table script is added to the database project and you intend to include it in the deployment script (Adventureworks.sql) you need to rebuild the database project (Right click –> Rebuild). Only this action will recreate the deployable AdventureWorks.sql with all the database objects and include the new table that was recently added. And to do this action from the command line you need the MSBuild command. So… If you already have a provision for running the MSBuild task from Hudson directly, why would you use Nant. The answer – in this case since we are ‘only’ deploying the database project we are fine. But, if we had a different range of project types with different technologies, like a Java project for instance, then using Nant would make sense. It all depends on the composition of your build steps\events.
Before you can access the MSBuild tasks in NAnt you need to install the NAntContrib task libraries and include them in your NAnt build file. Since all of this sounds a bit complex, lets just use MSBuild for now and follow the steps below to add the build steps:
1. Click on the AdventureWorks-Dev-CI-Build project and click on Configure.
3. Save and go back to the dashboard to test the changes. Hit the ‘Schedule a build’ button to start the project (Play button)
The second column ‘W’ (Weather Report) should go to sunny if everything was set up correctly.
Note: Click on the Build History –> Console Output. It should display all the files from AdventureWorks project being checked-out to a workspace location. In the event of a build failure, this would be the place to check for the failure error.
4. Make sure that the database project builds and deploys without errors from VS IDE.
5. Add build step: Go back to the project configuration and add an MSBuild step.
6. To add a trigger to poll for changes on source control, follow this final configuration or skip it.
7. Finally, save and go to dashboard. Start the build and verify the output.
This completes the setup of a database project with continuous integration using Hudson, Subversion and MSBuild (I know I promised using NAnt, but trust me – in our case it just would have been a bit more painful to go over. Helpful NAnt tutorial).
This was a very basic, barebones CI setup. Once you start delving into configurations, parameters, different environments it gets more complex and interesting. The database project is the core component of all of this CI setup process. The rest of the pieces like the CI server, source control, build tools and issue tracking systems are customizable. Depending on the environment and other influential factors you may decide to go with one product set versus another. This was just one way of getting started on CI on a database project with Visual Studio. There are numerous CI server products available as open source. A comparison is provided here. Hope this helped. Thank you and keep reading.