Percentage Sampling Transformation in SSIS

In general, the Percentage sampling transformation in SSIS is for data modeling. This SSIS Percentage Sampling transformation selects a given percentage of rows from the data source. Next, it transfers them to selected output rows option, and the remaining rows will move to unselected output rows.

For example, if the input has 1000 rows, and if we specify ten as a percentage sample, then the SSIS Percentage Sampling transformation returns approximately 10% of the RANDOM records from the input data.

The functionality of the Percentage Sampling Transformation in SSIS is the same as the Row Sampling Transformation. The only difference is, SSIS Percentage Sampling selects the percentage of rows rather than the number of rows (Row Sampling).

Percentage Sampling Transformation in SSIS Example

STEP 1: Drag and drop the data flow task from the toolbox, and rename it as Percentage Sampling Transformation

Percentage Sampling Transformation in SSIS 1

Double click on it will open the data flow tab.

STEP 2: Drag and drop OLE DB Source, Percentage Sampling Transformation from the toolbox to data flow region

Percentage Sampling Transformation in SSIS 2

STEP 3: Double click on OLE DB source in the data flow region will open the connection manager settings and provides space to write our SQL statement.

Percentage Sampling Transformation in SSIS 3

SQL command Text

The SQL Command that we used is:

SELECT TOP 1000 [FirstName]
      ,[LastName]
      ,[BirthDate]
      ,[Gender]
      ,[EmailAddress]
      ,[YearlyIncome]
      ,[EnglishEducation]
      ,[Phone]
  FROM [AdventureWorksDW2014].[dbo].[DimCustomer]
  ORDER BY [YearlyIncome] DESC

For the time being, we selected top 1000 rows from DimCustomers in Adventure Works DW 2014 database, and we want to send the 40% of rows to the destination table

Click on the columns tab to verify, and uncheck the unwanted columns

Percentage Sampling Transformation in SSIS 4

STEP 5: Click OK, and edit the SSIS Percentage Sampling Transformation

Percentage Sampling Transformation in SSIS 5

For this example, we are selecting 40% of rows from the input as selected rows, and remaining rows will consider as remaining rows.

TIP: If you check the “Use the following random seed” option, every time you run the package, it selects the same set of rows. If not, the Percentage Sampling Transformation in SSIS randomly picks the rows.

STEP 6: When we try to connect the Percentage sampling to OLE DB destination, an Input Output Selection pop up window to provide the choice of selecting output will open. First, select the 40 percent rows output as input for OLE DB destination. When you drag another arrow to other OLE DB Destination, it automatically selects the remaining rows output.

Percentage Sampling Transformation in SSIS 6

STEP 7: Now, we have to provide the Server, database, and table details of the destination. So double-click on the OLE DB Destination and provide the required information

Percentage Sampling Transformation in SSIS 7

Here we selected the [Percentage Sampling Selected Rows] table present in the SSIS Tutorials Database

STEP 8: Click on the Mappings tab to check the source columns mapped to the destination columns.

Percentage Sampling Transformation in SSIS 8

Click ok to finish the configuration of the OLE DB destination for the 40% ROWS.

STEP 9: Now, we have to configure the OLE DB destination for the remaining rows. So double-click on the OLE DB Destination and provide the required information

Percentage Sampling Transformation in SSIS 9

Here we selected the [Percentage Sampling Remaining Rows] table present in the SSIS Tutorials Database

STEP 10: Repeat Step 8 

Percentage Sampling Transformation in SSIS 10

Click OK to finish our Percentage Sampling Transformation in SSIS package design. Let us run the package

Percentage Sampling Transformation in SSIS 11

We configured the Percentage Sampling Transformation in SSIS to select 40 percent rows from the data source means 400 rows. From the above screenshot, 397 rows are selected and transferred to the [Percentage Sampling Selected Rows] table, which means approximately 40% rows. Remaining 603 records are transferred to [Percentage Sampling Remaining Rows] table

NOTE: Instead of selecting the exact percentage of rows, the Percentage Sampling Transformation will select the approximate percentage of rows. This estimated value differs every time you execute. For instance, If you run the above example multiple times, you may get results like 397, 401, 405, 395 rows, etc.

Let us open SQL Server Management Studio and check the result of both the tables

Selected Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
 FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows]

SELECT COUNT (*) FROM [SSIS Tutorials].[dbo].[Percentage Sampling Selected Rows]
Percentage Sampling Transformation in SSIS 12

Remaining Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
  FROM [AdventureWorksDW2014].[dbo].[Percentage Sampling Remaining Rows]

SELECT COUNT (*) FROM [SSIS Tutorials].[dbo].[Percentage Sampling Remaining Rows]
 Percentage Sampling Transformation in SSIS 13