The Talend Aggregate row or tAggregateRow field is used to perform Aggregations on numerical column data based on the given grouping column and returns the output. We can use this Talend Aggregate Row field to find the Sum of sales by Region, Total orders by State provinces, etc.
In this Talend Aggregate row example, we used the SQL Server database Customer table. The below screenshot shows the data inside that table.
Talend Aggregate Row Example
First, drag and drop the DBConnection, DBInput, and DBCommit to establish a SQL Server Connection. Next, we selected the Customers table from the metadata.
Next, drag the Talend tAggregateRow from the palette to the job design. As you can see from the below tooltip, tAggregateRow matches SQL GroupBy features. It receives a data flow and aggregates it based on the Group by column(s).
Please connect the DBInput main row to tAggregateRow. Generally, the input columns won’t sync with the tAggregateRow. So, click the Edit Schema button to add the required columns to the tAggregateRow.
Remember, you have to add only the Group by columns and the aggregated columns. It is because all the columns that are not part of the aggregation have to be in the Group by section. Here, we are adding Occupation, Yearly Income, and Sales columns.
Within the Talend tAggregateRow Components tab, there is a Group By section and an Operations section. Within the Group By section, you have to select the Group by columns and map them.
- Output Column: Please select the output column for grouping. It is the same column from the tAggregateRow output schema.
- Input Column Position: Here, you have to map or select the input column from the source, i.e., tDBInput.
Please use the Add (plus) button to add the Group by Columns. Here, we added Occupation columns and mapped with the input column as well. You can use the drop-down list to see the other columns as well.
Within the Talend Aggregate Row Operations section, we have
- Output Column: Please select the output column for aggregation. It is a column from the tAggregateRow output schema.
- Function: Please select the aggregation you want to perform. For instance, Sum, Avg, etc.
- Input Column Position: Here, you have to map or select the input column from the source, i.e., tDBInput.
- Ignore Null Values: Do you want to ignore the null values?
By default, when you click on the plus button, Talend will add the first numeric column from the tAggregateRow output schema, and select the count function.
The following screenshot shows the list of available aggregation functions in Talend. I already explained them in the SQL tutorial, so please refer to the same.
As you can see, we are grouping Customers by occupation and finding the Sum of Yearly Income and Average Sales.
After finishing the tAggregateRow configuration, we are using the DBOutput to save the Talend Aggregate Row output in the SQL Database. Here, we are creating the Talend_tAggregateRow table as the destination table.
Let us run the Talend aggregate row Job.
Let me open the Management studio to check the tAggregateRow result. To clarify, we used the T-SQL query on the Customers table to compare the result with tAggregateRow.
SELECT [Occupation] ,SUM([YearlyIncome]) AS [YearlyIncome] ,AVG([Sales]) AS [Sales] FROM [SQL Tutorial].[dbo].[Customer] GROUP BY [Occupation] ORDER BY YearlyIncome DESC
This time, we are the Education fields, so we added the same in the Talend aggregate row schema.
Next, we used the Education column in Group By. It means, first, customers will grouped by Education and then by occupation and perform the Sum and Average.
Please check the Talend Aggregate row result.
and the code we used for the above is
SELECT Education, [Occupation] ,SUM([YearlyIncome]) AS [YearlyIncome] ,AVG([Sales]) AS [Sales] FROM [SQL Tutorial].[dbo].[Customer] GROUP BY Education, [Occupation] ORDER BY YearlyIncome DESC
Let me select the Occupation column only and then create a few more new columns to store the output.
Next, we used most of the Talend aggregate functions as the output.
The result of the Talend Aggregate Row job
and the code is
SELECT [Occupation] ,SUM([YearlyIncome]) AS [IncomeSum] ,AVG([YearlyIncome]) AS [IncomeAvg] ,COUNT([YearlyIncome]) AS [IncomeCount] ,MIN([YearlyIncome]) AS [IncomeMin] ,MAX([YearlyIncome]) AS [IncomeMax] ,STDEV([YearlyIncome]) AS [IncomeStd] FROM [SQL Tutorial].[dbo].[Customer] GROUP BY [Occupation] ORDER BY IncomeSum DESC;