Row Sampling Transformation in SSIS

Row Sampling Transformation in SSIS provides an option to specify the number of rows you would like to retrieve from the data source. SSIS Row Sampling Transformation Data is randomly selected unless you use this Use the following random seed option.

Row Sampling Transformation in SSIS will take the whole data from a source, and it randomly extracts the selected number of rows. SSIS Row Sampling gives you two outputs. The first is the selected data, and the second one is unselected data. This SSIS Row Sampling transformation is useful in training the data mining model.

It is too early to understand data modeling, so let’s take a simple example — for instance, 100 employees working in a software company. The company has to select ten employees for an abroad tour. Then the company can use SSIS Row Sampling Transformation to select ten employees randomly. If the company would likely to choose the same ten employees every year, they can checkmark the Use the following random seed option.

Row Sampling Transformation in SSIS Example

STEP 1: Drag and drop the data flow task from the toolbox to control flow and name it as Row Sampling Transformation.

Row Sampling Transformation in SSIS 1

Double click on it will open the SSIS data flow tab.

STEP 2: Drag and drop OLE DB Source, Row Sampling transformation from the SSIS toolbox to data flow region

Row Sampling Transformation in SSIS 2

STEP 3: Double click on OLE DB source in the data flow region will open the connection manager settings and provides space to write our SQL statement.

Row Sampling Transformation in SSIS 3

SQL Command Text

SELECT TOP 1000 [FirstName]
      ,[LastName]
      ,[BirthDate]
      ,[Gender]
      ,[EmailAddress]
      ,[YearlyIncome]
      ,[EnglishEducation]
      ,[Phone]
  FROM [AdventureWorksDW2014].[dbo].[DimCustomer]
  ORDER BY [YearlyIncome] DESC

For the time being we selected top 1000 rows from DimCustomers in Adventure Works DW 2014 data base and we want to select 300 number of rows into the destination table.

STEP 4: Click on the columns tab to verify the columns. In this tab, we can uncheck the unwanted columns also.

Row Sampling Transformation in SSIS 4

TIP: If we don’t want any column, then don’t add them to your SQL command.

STEP 5: Click ok and connect the OLE DB Source and Row Sampling Transformation using the arrow. Double click on the SSIS Row Sampling Transformation to configure.

Here we are selecting 300 rows as selected rows and remaining 700 rows as unselected rows.

Row Sampling Transformation in SSIS 5

TIP: If you check the “Use the following random seed” option, then it will select the same set of rows every time you run the package. If not, the Row Sampling Transformation will pick the rows randomly.

STEP 6: When we try to connect the SSIS Row Sampling Transformation to OLE DB destination. A pop up called Input Output Selection window to provide the choice of selecting output.

Row Sampling Transformation in SSIS 6

Here we are selecting the SELECTED ROWS output. Click ok and drag and drop another arrow from SSIS Row Sampling to the second destination. Here it will automatically choose the Remaining Rows option.

STEP 7: Now, we have to provide the Server, database, and table details of the destination. So double click on the OLE DB Destination for the selected rows and give the required information

Row Sampling Transformation in SSIS 7

Here we are choosing the [Row Sampling Selected Rows] table present in the [Adventure Works DW 2014] database

STEP 8: Click on the Mappings tab to see columns mapped precisely.

 Row Sampling Transformation in SSIS 8

Click ok to finish the configuration of the OLE DB destination for the SELECTED ROWs.

NOTE: If your input column names and destination column names are the same then, the intelligence automatically map. If there are changes in the column names, then we have to assign them manually.

STEP 9: Now, we have to configure the OLE DB destination for the remaining rows. So double click on the OLE DB Destination and provide the required information

Row Sampling Transformation in SSIS 9

Here we are selecting the [Row Sampling Remaining Rows] table present in the [Adventure Works DW 2014] database

Click on the Mappings tab to check whether the source columns are correctly mapped to the destination columns and then click ok to finish row sampling transformation in SSIS package design.

Let us run the SSIS Row Sampling Transformation package

Row Sampling Transformation in SSIS 10

Let us open SQL Server Management Studio and check the SSIS Row Sampling Transformation result of both the tables

Selected Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
FROM [AdventureWorksDW2014].[dbo].[Row Sampling Selected Rows]

SELECT COUNT (*) FROM [AdventureWorksDW2014].[dbo].[Row Sampling Selected Rows]
Row Sampling Transformation in SSIS 11

You can observe the number of rows in that table = 300

Remaining Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows]

SELECT COUNT (*) FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows]
SSIS Row Sampling Transformation