OverviewSQL Server Integration Services (SSIS) is a tool that we use to perform ETL operations; i.e. extract, transform and load data. While ETL processing is common in data warehousing (DW) applications, SSIS is by no means limited to just DW; e.g. when you create a Maintenance Plan using SQL Server Management Studio (SSMS) an SSIS package is created. At a high level, SSIS provides the ability to:
- retrieve data from just about any source
- perform various transformations on the data; e.g. convert from one type to another, convert to uppercase or lowercase, perform calculations, etc.
- load data into just about any source
- define a workflow
The first version of SSIS was released with SQL Server 2005. SSIS is a replacement for Data Transformation Services (DTS) which was available with SQL Server 7.0 and SQL Server 2000. SSIS builds on the capabilities introduced with DTS.
In this tutorial we will step through a number of topics that you need to understand in order to successfully build an SSIS package. Our high level outline is as follows:
- Creating SSIS packages with SQL Server Management Studio (SSMS)
- Business Intelligence Development Studio (BIDS)
- Creating a simple SSIS package in BIDS
- Deploying SSIS packages
- Executing SSIS packages
Overview
In this section we will walk through creating a simple SSIS package from scratch. As an example we will generate an Excel spreadsheet that can be used to manually enter a sales forecast by city and state. The city and state information will be extracted from the AdventureWorksDW database and output to an Excel spreadsheet. The sales department could use this spreadsheet to enter the forecast. Another SSIS package will import the data from the spreadsheet and update a table in the AdventureWorksDW database.
In this section we will walk through creating a simple SSIS package from scratch. As an example we will generate an Excel spreadsheet that can be used to manually enter a sales forecast by city and state. The city and state information will be extracted from the AdventureWorksDW database and output to an Excel spreadsheet. The sales department could use this spreadsheet to enter the forecast. Another SSIS package will import the data from the spreadsheet and update a table in the AdventureWorksDW database.
In the sections that follow we will walk through the following steps:
- Create a new SSIS package and discuss some of the package properties
- Add Connection Managers for our data source and destination
- Add tasks to the package Control Flow
- Add tasks to the package Data Flow
- Execute the package in BIDS
SQL Server Integration Services (SSIS) Data Flow
OverviewThis is our fourth step in creating a simple SSIS package from scratch. In this section we will take a look at the Data Flow designer which contains the various tasks that will be performed by a Data Flow task specified in the Control Flow.
ExplanationTo begin launch BIDS by selecting SQL Server Business Intelligence Development Studio from the Microsoft SQL Server program group. Click File, Open, Project / Solution on the top level menu to display the Open Project dialog. Navigate to the location of the solution as shown below then click Open:
Expand the SSIS Packages node under the Tutorial-Sample-1 project in Solution Explorer and you will see the following:
Double click on the SSIS package CreateSalesForecastInput.dtsx to open the package. Click on the Data Flow tab in the designer and you should see the empty Data Flow designer as shown below:
The Toolbox will now display the tasks that are available in the Data Flow as shown below:
As you can see there are quite a few built-in tasks available. For our current package we want to retrieve some data from the AdventureWorksDW database and output it to an Excel spreadsheet. To do this we need an OLE DB Source and an Excel Destination. In addition we need to add a column to the Data Flow; this column will be the numeric column in the spreadsheet where the user enters the forecast amount. Drag and drop an OLE DB Source, Derived Column, and Excel Destination from the Toolbox onto the Data Flow designer. After doing so the Data Flow designer should look like this:
The red icons inside of the tasks are an indicator that there is some configuration required. Right click on the OLE DB Source then select Edit from the popup menu. This will open the OLE DB Source Editor on the Connection Manager page. Fill in the dialog as shown below:
We setup the OLD DB connection manager in the Connection Managers step earlier. The SQL command text contains a simple query to retrieve the list of cities and states in the United States.
Click on Columns to display the column mappings as shown below:
By default all columns returned from the query are selected and the Output Column names are the same. The Output Column names can be edited; we will leave them as is. Click OK to complete the OLE DB Source Editor. You will now see a green and a red line dangling from the OLE DB Source task on the Data Flow designer. The green line is the normal data flow; the red line is the error flow. Drag the green line and connect it to the Derived Column task as shown below:
Notice that the red icon is now gone from the OLE DB Source task and it is now properly configured.
The Derived Column task allows us to add a column to the Data Flow. Right click on it, select Edit, then fill in the Derived Column transformation Editor as shown below:
We have now added a new currency column to the data flow with a name of Forecast, and a value of 0. Click on the Derived Column task and drag the green arrow to connect to the Excel Destination task.
Right click the Excel Destination task and select Edit from the popup menu. The Excel Destination Editor dialog will be displayed. Click the New button next to the Name of the Excel Sheet to display the Create Table dialog as shown below:
The Create Table dialog allows us to create a new table in the Excel spreadsheet. The columns and their types are determined by the data flow. We configured an OLE DB Source task that executes a query and a Derived Column task that added the Forecast column to the data flow. You can edit the CREATE TABLE script if you like. Click OK on the Create Table dialog and the Excel Destination Editor Connection Manager page will look like this:
We configured the Excel Connection Manager in step two of this tutorial. Click on Mappings to display the Mappings page as shown below:
The above dialog sets the Destination Column names to the Input Column names by default.
At this point we have completed the Data Flow. Note that there are no longer any red icons in the task rectangles. We will proceed to the final step in this section of the tutorial and execute the package.
SSIS Tutorial: Creating a Simple ETL Package
Microsoft SQL Server Integration Services (SSIS) is a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing. SSIS includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending e-mail messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, the Integration Services service for administering package execution and storage; and application programming interfaces (APIs) for programming the Integration Services object model.
In this tutorial, you will learn how to use SSIS Designer to create a simple Microsoft SQL Server Integration Services package. The package that you create takes data from a flat file, reformats the data, and then inserts the reformatted data into a fact table. In following lessons, the package is expanded to demonstrate looping, package configurations, logging and error flow.
When you install the sample data that the tutorial uses, you also install the completed versions of the packages that you will create in each lesson of the tutorial. By using the completed packages, you can skip ahead and begin the tutorial at a later lesson if you like. If this is your first time working with packages or the new development environment, we recommend that you begin with Lesson1.
No comments:
Post a Comment