Row Sampling Transformation in SSIS provides an option to specify the number of rows you would like to retrieve from the data source. SSIS Row Sampling Transformation Data is randomly selected unless you use this Use the following random seed option.
Row Sampling Transformation in SSIS will take the whole data from a source and randomly extract the selected number of rows. SSIS Row Sampling gives you two outputs. The first is the selected data, and the second one is unselected data. This Row Sampling transformation is useful in training the data mining model.
It is too early to understand data modeling, so let’s take a simple example — for instance, 100 employees working in a software company. The company has to select ten employees for an abroad tour. Then, the company can use SSIS Row Sampling Transformation to select ten employees randomly. If the company would likely choose the same ten employees every year, they can checkmark the Use the following random seed option.
Row Sampling Transformation in SSIS Example
STEP 1: Drag and drop the data flow task from the toolbox to control flow and name it Row Sampling Transformation.
Double-click on it will open the data flow tab. For more Transformations >> Click Here.
STEP 2: Drag and drop OLE DB Source, Row Sampling transformation from the SSIS toolbox to data flow region
STEP 3: Double-click on the OLE DB source in the data flow region. It will open the connection manager settings and provide space to write our SQL statement.
For the time being, we selected the top 1000 rows from DimCustomers in the Adventure Works DW 2014 database, and we want to select 300 rows in the destination table. The Command Text is shown below.
SELECT TOP 1000 [FirstName] ,[LastName],[BirthDate],[Gender],[EmailAddress]
,[YearlyIncome],[EnglishEducation],[Phone]
FROM [AdventureWorksDW2014].[dbo].[DimCustomer]
ORDER BY [YearlyIncome] DESC
STEP 4: Click on the columns tab in the OLE DB Source Editor to verify the columns. In this tab, we can uncheck the unwanted columns also.
TIP: If we don’t want any columns, don’t add them to your SQL command.
STEP 5: Click ok and connect the OLE DB Source and Row Sampling Transformation using the arrow—Double-click on the SSIS Row Sampling Transformation to configure.
Here, we select 300 rows as selected rows and the remaining 700 as unselected rows.
TIP: If you check the “Use the following random seed” option, it will select the same set of rows every time you run the package. If not, the Row Sampling Transformation will pick the rows randomly.
STEP 6: When we try to connect the SSIS Row Sampling Transformation to the OLE DB destination. A pop-up called the Input Output Selection window provides the choice of selecting output.
Here, we are selecting the SELECTED ROWS output. Click ok and drag and drop another arrow from SSIS Row Sampling to the second destination. Here, it will automatically choose the Remaining Rows option.
STEP 7: Now, we have to provide the Server, database, and table details of the destination. So, double-click on the OLE DB Destination for the selected rows and give the required information.
Here, we are choosing the [Row Sampling Selected Rows] table present in the [Adventure Works DW 2014] database
STEP 8: Click on the Mappings tab to see columns mapped precisely.
Click ok to finish the configuration of the OLE DB destination for the SELECTED ROWs.
NOTE: If your input column names and destination column names are the same, then the intelligence automatically maps. If there are changes in the column names, we must assign them manually.
STEP 9: We have to configure the OLE DB destination for the remaining rows. So, double-click on the OLE DB Destination and provide the required information.
Here, we are selecting the [Row Sampling Remaining Rows] table present in the [Adventure Works DW 2014] database
Click on the Mappings tab to check whether the source columns are correctly mapped to the destination columns, and then click ok to finish the row sampling transformation in the SSIS package design.
Let us run the SSIS Row Sampling Transformation package.
Let us open Management Studio and check the SSIS Row Sampling Transformation result of both tables.
Selected Rows Table
SELECT [FirstName],[LastName],[BirthDate] ,[Gender],[EmailAddress], [YearlyIncome] ,[EnglishEducation],[Phone] FROM [AdventureWorksDW2014].[dbo].[Row Sampling Selected Rows] SELECT COUNT (*) FROM [AdventureWorksDW2014].[dbo].[Row Sampling Selected Rows]
You can observe the number of rows in that table = 300
Remaining Rows Table
SELECT [FirstName],[LastName],[BirthDate] ,[Gender],[EmailAddress], [YearlyIncome] ,[EnglishEducation],[Phone] FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows] SELECT COUNT (*) FROM [AdventureWorksDW2014].[dbo].[Row Sampling Remaining Rows]