Row Sampling Transformation in SSIS

Row Sampling Transformation in SSIS provides an option to specify the number of rows you would like to retrieve from the data source. SSIS Row Sampling Transformation Data is randomly selected unless you use this Use the following random seed option.

Row Sampling Transformation in SSIS will take the whole data from a source and randomly extract the selected number of rows. SSIS Row Sampling gives you two outputs. The first is the selected data, and the second one is unselected data. This Row Sampling transformation is useful in training the data mining model.

It is too early to understand data modeling, so let’s take a simple example — for instance, 100 employees working in a software company. The company has to select ten employees for an abroad tour. Then, the company can use SSIS Row Sampling Transformation to select ten employees randomly. If the company would likely choose the same ten employees every year, they can checkmark the Use the following random seed option.

Row Sampling Transformation in SSIS Example

STEP 1: Drag and drop the data flow task from the toolbox to control flow and name it Row Sampling Transformation.

Data Flow Task 1

Double click on it will open the data flow tab. For more Transformations >> Click Here.

STEP 2: Drag and drop OLE DB Source, Row Sampling transformation from the SSIS toolbox to data flow region

Row Sampling Transformation in SSIS 2

STEP 3: Double click on the OLE DB source in the data flow region. It will open the connection manager settings and provide space to write our SQL statement.

OLE DB Source Editor 3

Command Text

SELECT TOP 1000 [FirstName]
      ,[LastName]
      ,[BirthDate]
      ,[Gender]
      ,[EmailAddress]
      ,[YearlyIncome]
      ,[EnglishEducation]
      ,[Phone]
  FROM [AdventureWorksDW2014].[dbo].[DimCustomer]
  ORDER BY [YearlyIncome] DESC

For the time being, we selected the top 1000 rows from DimCustomers in the Adventure Works DW 2014 database, and we want to select 300 rows in the destination table.

STEP 4: Click on the columns tab to verify the columns. In this tab, we can uncheck the unwanted columns also.

Choose from available external columns 4

TIP: If we don’t want any columns, don’t add them to your SQL command.

STEP 5: Click ok and connect the OLE DB Source and Row Sampling Transformation using the arrow. Double-click on the SSIS Row Sampling Transformation to configure.

Here, we are selecting 300 rows as selected rows and the remaining 700 rows as unselected rows.

Row Sampling Transformation in SSIS 5

TIP: If you check the “Use the following random seed” option, it will select the same set of rows every time you run the package. If not, the Row Sampling Transformation will pick the rows randomly.

STEP 6: When we try to connect the SSIS Row Sampling Transformation to the OLE DB destination. A pop-up called the Input Output Selection window provides the choice of selecting output.

Row Sampling Transformation in SSIS 6

Here, we are selecting the SELECTED ROWS output. Click ok and drag and drop another arrow from SSIS Row Sampling to the second destination. Here, it will automatically choose the Remaining Rows option.

STEP 7: Now, we have to provide the Server, database, and table details of the destination. So, double-click on the OLE DB Destination for the selected rows and give the required information.

Row Sampling Transformation in SSIS 7

Here, we are choosing the [Row Sampling Selected Rows] table present in the [Adventure Works DW 2014] database

STEP 8: Click on the Mappings tab to see columns mapped precisely.

Input and Destination Columns Mapping 8

Click ok to finish the configuration of the OLE DB destination for the SELECTED ROWs.

NOTE: If your input column names and destination column names are the same, then the intelligence automatically maps. If there are changes in the column names, we must assign them manually.

STEP 9: We have to configure the OLE DB destination for the remaining rows. So, double-click on the OLE DB Destination and provide the required information.

Row Sampling Transformation in SSIS 9

Here, we are selecting the [Row Sampling Remaining Rows] table present in the [Adventure Works DW 2014] database

Click on the Mappings tab to check whether the source columns are correctly mapped to the destination columns, and then click ok to finish the row sampling transformation in SSIS package design.

Let us run the SSIS Row Sampling Transformation package.

Row Sampling Transformation in SSIS 10

Let us open Management Studio and check the SSIS Row Sampling Transformation result of both tables.

Selected Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
FROM [AdventureWorksDW2014].[dbo].[Row Sampling Selected Rows]

SELECT COUNT (*) FROM [AdventureWorksDW2014].[dbo].[Row Sampling Selected Rows]
Target Table 11

You can observe the number of rows in that table = 300

Remaining Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows]

SELECT COUNT (*) FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows]
Destination Table