Percentage Sampling Transformation in SSIS

In general, the Percentage sampling transformation in SSIS is for data modeling. This SSIS Percentage Sampling transformation selects a given percentage of rows from the data source. Next, it transfers them to the selected output rows option, and the remaining rows will move to unselected output rows.

For example, if the input has 1000 rows, and if we specify ten as a percentage sample, then the SSIS Percentage Sampling transformation returns approximately 10% of the RANDOM records from the input data.

The functionality of the Percentage Sampling Transformation in SSIS is the same as the Row Sampling Transformation. The only difference is that SSIS Percentage Sampling selects the percentage of rows rather than the number of rows (Row Sampling).

Percentage Sampling Transformation in SSIS Example

STEP 1: Drag and drop the data flow task from the toolbox and rename it as Percentage Sampling Transformation

Percentage Sampling Transformation in SSIS 1

Double click on it will open the data flow tab. For more Transformations >> Click Here.

STEP 2: Drag and drop OLE DB Source, Percentage Sampling Transformation from the toolbox to the data flow region

Percentage Sampling Transformation in SSIS 2

STEP 3: Double click on the OLE DB source in the data flow region. It will open the connection manager settings and provide space to write our SQL statement.

Percentage Sampling Transformation in SSIS 3

The SQL Command that we used is:

SELECT TOP 1000 [FirstName]
      ,[LastName]
      ,[BirthDate]
      ,[Gender]
      ,[EmailAddress]
      ,[YearlyIncome]
      ,[EnglishEducation]
      ,[Phone]
  FROM [AdventureWorksDW2014].[dbo].[DimCustomer]
  ORDER BY [YearlyIncome] DESC

For the time being, we selected the top 1000 rows from DimCustomers in the Adventure Works DW 2014 database, and we want to send 40% of the rows to the destination table. Click on the columns tab to verify and uncheck the unwanted columns

Percentage Sampling Transformation in SSIS 4

STEP 5: Click OK and edit the SSIS Percentage Sampling Transformation.

Percentage Sampling Transformation in SSIS 5

For this example, we are selecting 40% of rows from the input as selected rows, and the remaining rows will be considered as remaining rows.

TIP: If you check the “Use the following random seed” option, every time you run the package, it selects the same set of rows. If not, the Percentage Sampling Transformation in SSIS randomly picks the rows.

STEP 6: When we try to connect the Percentage sampling to the OLE DB destination, an Input Output Selection pop-up window to provide the choice of selecting output will open. First, select the 40 percent rows output as input for the OLE DB destination. When you drag another arrow to another OLE DB Destination, it automatically selects the remaining row’s output.

Percentage Sampling Transformation in SSIS 6

STEP 7: Now, we have to provide the Server, database, and table details of the destination. So, double-click on the OLE DB Destination and provide the required information.

Percentage Sampling Transformation in SSIS 7

Here, we selected the [Percentage Sampling Selected Rows] table present in the SSIS Tutorials Database

STEP 8: Click the Mappings tab to check the source columns mapped to the destination columns.

Percentage Sampling Transformation in SSIS 8

Click ok to finish the configuration of the OLE DB destination for the 40% ROWS.

STEP 9: Now, we have to configure the OLE DB destination for the remaining rows. So, double-click on the OLE DB Destination and provide the required information.

OLE DB Destination Editor configuration 9

Here, we selected the [Percentage Sampling Remaining Rows] table present in the SSIS Tutorials Database

STEP 10: Repeat Step 8

Check Input and Destination Columns Mapping 10

Click OK to finish our Percentage Sampling Transformation in SSIS package design. Let us run the package.

Percentage Sampling Transformation in SSIS 11

We configured the Percentage Sampling Transformation in SSIS to select 40 percent of rows from the data source, which means 400 rows. From the above screenshot, 397 rows are selected and transferred to the [Percentage Sampling Selected Rows] table, which means approximately 40% of rows. The remaining 603 records are transferred to the [Percentage Sampling Remaining Rows] table

NOTE: Instead of selecting the exact percentage of rows, the Percentage Sampling Transformation will select the approximate percentage of rows. This estimated value differs every time you execute. For instance, If you run the above example multiple times, you may get results like 397, 401, 405, 395 rows, etc.

Let us open Management Studio and check the results of both tables.

Selected Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
 FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows]

SELECT COUNT (*) FROM [Percentage Sampling Selected Rows]
Selected Rows Table 12

Remaining Rows Table

SELECT [FirstName],[LastName],[BirthDate]
      ,[Gender],[EmailAddress], [YearlyIncome]
      ,[EnglishEducation],[Phone]
  FROM [AdventureWorksDW2014].[dbo].[Percentage Sampling Remaining Rows]

SELECT COUNT (*) FROM [Percentage Sampling Remaining Rows]
Remaining Rows Table 13