The Aggregate Transformation in SSIS performs two main tasks:
- Applying Aggregate Operations on your input data
- Allows you to group your data based on values in your input columns.
Basically, it performs a function similar to the SQL Server Aggregate functions. The operations include the following:
- GROUP BY: Just like the GROUP BY Clause in SQL SELECT queries.
- COUNT: It will Count the number of values in this column. Null values are included in the count if you select (*) as the input column. Otherwise, null values are ignored.
- COUNT DISTINCT: It will count the number of Distinct values in this column.
- SUM: Calculate the Sum of the Column values.
- AVERAGE: Calculate the Average of the Column values.
- MINIMUM: Calculate the Minimum of Column values.
- MAXIMUM: Calculate the Maximum of Column values.
The Aggregate transformation in SSIS can be configured in Basic mode or Advanced mode. In Basic mode, the Aggregate transformation has one output and in Advanced mode, it can have more than one output, each containing the result of a different aggregation. In this article we will show you, How to configure the Aggregate transformation in SSIS to produce single output with example.
TIP: Please refer Aggregate Transformation in Advanced Mode article to know, How to Configure multiple outputs in Aggregate Transformation
The Aggregation transformation in SSIS includes the IsBig property, which is used on the output columns to handle the big or high-precision numbers. If any of the column value may exceed 4 billion then IsBig should be set to 1. By setting the IsBig property to 1, Aggregation Transformation Will:
- Use DT_R8 data type instead of the DT_R4 data type.
- Use DT_UI8 data type to store the Count results.
- You can use DT_UI4 data type to store the Distinct count results.
Aggregate Transformation in SSIS Basic Mode Example
For instance, If we want to find the sum of the sales amount for each Color then we can use this Aggregate Transformation. In this example we are going to perform all the Aggregate Transformation operations on DimGeography and FactResellerSales tables present in AdventureworkDW2014.
STEP 1: Open BIDS and Drag and drop the data flow task from the toolbox to control flow and name it as Aggregate Transformation.
Double click on it will open the data flow tab.
STEP 2: Drag and drop OLE DB Source, Aggregate Transformation from toolbox to data flow region
STEP 3: Double click on OLE DB source in the data flow region will open the connection manager settings and provides space to write our SQL statement. Here we are using DimGeography and [FactResellerSales] present in the AdventureworkDW2014
SQL command we are using to retrieve data is:
-- Query for SSIS Aggregate Transformation Basic Mode USE [AdventureWorksDW2014] GO SELECT [EnglishCountryRegionName] ,[StateProvinceName] ,[City] ,[PostalCode] ,RESELLER.[UnitPrice] ,RESELLER.[ProductStandardCost] ,RESELLER.[TotalProductCost] ,RESELLER.[SalesAmount] ,RESELLER.[TaxAmt] FROM [DimGeography] INNER JOIN [FactResellerSales] AS RESELLER ON [DimGeography].[SalesTerritoryKey] = RESELLER.[SalesTerritoryKey]
STEP 4: Click on columns tab to verify the columns. In this tab we can uncheck the unwanted columns also.
Drag and drop the blue arrow from OLE DB Source to Aggregate Transformation to connect the data.
STEP 5: Double-click the Aggregate transform to open the editor. Next in the lower pane we select the required columns to perform Aggregate Transformation operations by checking them in Available Input Columns.
Once you check the required columns then those column names will be appeared in the Input Column located in lower pane as shown in below figure.
- Input Columns: Column we received from OLE DB source.
- Output Alias: This is same as SQL Alias. Here we can change the column name as per the project requirements.
- Operations: We already discussed above.
For this example we are going to use most of the Aggregate functions like below
From the above screenshot you can observe that, We are calculating the Sum of Unit Price, Sum Product Standard Cost, Minimum of Total Product Cost, Maximum of Sales Amount and Average of Tax Amount and Group by English Country Region Then by State Province Name and Then by City.
NOTE: If your input column is string type, you will not see Sum, Average and other aggregate functions but only Group by. Always make sure to convert data type of column to proper data type before providing as input to Aggregate Transformation.
STEP 6: Drag and drop the OLE DB Destination into the data flow region and drag the blue arrow from Aggregate Transformation to OLE DB Destination
STEP 7: Now we have to provide Server, database and table details of the destination. So double-click on the OLE DB Destination and provide the required information.
Here we pointer our destination database as [SSIS Tutorials] and for time being I used New button to create destination table for me and named it as Aggregate Transformation Basic Mode but in reality you should not design your table like this.
STEP 8: Click on Mappings tab to check whether the source columns are exactly mapped to the destination columns.
Click ok to finish our Aggregate Transformation in SSIS package design. Let us run the package
Let’s open the SQL Server Management Studio and write below T-SQL statement to check the results
SELECT [EnglishCountryRegionName] ,[StateProvinceName] ,[City] ,[UnitPrice] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] FROM [SSIS Tutorials].[dbo].[Aggrigate Transformation Basic Mode]
Aggregate Transformation Advanced Option
The Aggregate Transformation in SSIS includes a set of properties in the Advanced Tab present in the Aggregate Transformation Editor. By setting these properties we can increase the performance of the Aggregate Transformation.
Key Scale: In this option, specify the approximate number of keys an aggregation can write. By default, the value of this option is Unspecified.
|Unspecified||The Key Scale property is not used.|
|Low||Aggregation can write nearly 500,000 keys.|
|Medium||Aggregation can write nearly 5,000,000 keys.|
|High||Aggregation may write more than 25,000,000 keys.|
- Keys: Specifying the exact number of keys an aggregation can write. Keys refers to the number of groups that are expected to result from a Group by operation. If both the Key Scale and Keys properties are set, the value of Keys takes precedence.
- Count Distinct Scale: We can specify the approximate number of distinct values an aggregation can write.
- Count Distinct Keys: In this, We can specify the exact number of distinct values that the aggregation can write. If both CountDistinctScale and CountDistinctKeys are specified, CountDistinctKeys takes priority.
- Auto extend factor: In this option, We can specify the percentage of memory it can extend in the aggregation process. We can choose the values from 1 to 100. By default, the value of this option is 25%.
Thank you for Visiting Our Blog