SQL GROUP BY Clause

The SQL Server GROUP BY Clause returns aggregated data by grouping one or more columns and performing the aggregated functions on the remaining columns. The GROUP BY statement groups the rows in a column that have the same (common) value. For example, combining all rows in a country column with the USA name into a single basket, row, or group. Based on those groups, you can perform computations, such as totals, averages, counting, etc.

The SQL GROUP BY statement is often used in conjunction with aggregate functions such as SUM, COUNT, AVG, MIN, MAX, etc. In general, aggregate functions work on the entire table by combining with the GROUP BY clause; calculations happen for partitions. Remember, the aggregated columns can be one or more columns.

For instance, if you want to determine the total number of sales by region or country, use the SQL Server GROUP BY Clause to categorise the Sales table by region or country. Then we can use the aggregate function SUM to calculate the total. Similarly, use AVG to calculate the sales average.

It is better to say that the SQL GROUP BY clause id used to summarise the data for high-level business reporting. To filter the normal rows before grouping, use the WHERE clause and for the grouped rows, we have to use the HAVING clause.

NOTE: If there are any NULL values when grouping, the server will consider all NULLs as one grouping and display them in the result set.

SQL GROUP BY Clause Syntax

The syntax of the GROUP BY clause, with the most common combinations for performing aggregations, is shown below.

SELECT [Column1], [Column2]...[ColumnN],
       Aggregate Function (Expression | Column_Name) 
FROM [Source]
WHERE [Conditions] -- Optional
GROUP BY [Column1], [Column2]...[ColumnN]
HAVING [Aggregate Condition]
ORDER BY Cols

From the above SQL Server GROUP BY clause syntax

The SELECT statement allows you to choose the required number of columns from the tables, which may be one or more, for grouping.
Aggregate Function: We can use aggregate functions such as MIN, AVG, MAX, and COUNT on any column. It can be a single column or an expression.
A Source can be one or more tables to get the required columns, and use JOINS to combine multiple tables.
WHERE: It is optional. However, it provides filters or conditions before the grouping happens. If the condition is TRUE, then only the SQL GROUP BY SELECT Statement returns the records.
GROUP BY (important): All Selected Columns that are not part of the Aggregate Functions should be placed in this clause. In the above syntax, [Column1], [Column2]…[ColumnN].
HAVING: It is optional to use the HAVING clause. However, it helps to filter the aggregated data.
ORDER BY: To sort the grouped data in ascending or descending order.

TIP: When performing grouping, the WHERE clause comes before GROUP BY, whereas the HAVING clause comes after the GROUP BY.

SQL GROUP BY Clause Examples

This article shows the use of the GROUP BY clause to categorise and aggregate the data while extracting it using the SELECT statement. To explain this, we use the AdventureWorksDW2022 database.

The SQL GROUP BY clause generally combines the common names within a given column and puts them into a single bucket. However, the real power comes when performing aggregations to view statistical information. For example, average sales by product category. If you want to display the aggregated information to the end user, use this clause.

The following list of examples explains how the GROUP BY clause works with the table column.

SQL GROUP BY Single and Multiple Columns

When you need high-level data for analysis purposes, you must use a single column. For instance, total sales by country.

To demonstrate the same, we use a single column to perform grouping in this example. Here, we group the Customers based on their Gender and use the STDEV function to find the Standard Deviation of the Yearly Income.

SELECT [Gender]
      ,STDEV([YearlyIncome]) StandardDeviation
FROM [DimCustomer]
GROUP BY Gender

In the above query:

GROUP BY Gender: It groups the DimCustomers into two groups, M (Male) and F (Female).
STDEV([YearlyIncome]): The STDEV function finds the Standard Deviation for the Male and Female.

Gender StandardDeviation
------ ----------------------
F      31861.8947666044
M      32696.1812176709

As we mentioned earlier, you must include the non-aggregated column in the SQL GROUP BY clause. Otherwise, the equerry will produce an error. For example, if you remove the last line ‘GROUP BY Gender’ from the above query, the following error will occur.

Message

Msg 8120, Level 16, State 1, Line 1
Column 'DimCustomer.Gender' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

When grouping similar products or rows, it is not mandatory to use a single column. The SQL Server GROUP BY clause allows you to use multiple columns. By adding more columns, you can see the aggregated data at various levels. For instance, total sales by Country and State. By adding City, you can further drill down.

To demonstrate this, we use both the Gender and Marital Status columns. The query below calculates the total yearly income of customers by their gender and marital status.

SELECT [Gender], MaritalStatus
      ,SUM([YearlyIncome]) TotalIncome
FROM [DimCustomer]
GROUP BY Gender, MaritalStatus
ORDER BY Gender

Gender MaritalStatus TotalIncome
------ ------------- ---------------------
F      S             237080000.00
F      M             286880000.00
M      S             217640000.00
M      M             317640000.00

If you examine the SQL Server result set of this example, it displays four records. As per the table, there are four combinations, i.e., Male and Female with Single and Married Status. So, the GROUP BY clause first groups the DimCustomer by their Gender and then by MaritalStatus. Next, it calculates the sum of the yearly Income for:

Male Married
Male Single.
Female Married.
Female Single.

SQL GROUP BY Multiple Tables

In a normalised database, it is not possible to perform calculations based on one table. Please use any of the available JOIN to combine multiple tables.

In this example, we will show how to use the SQL Server GROUP BY clause with INNER JOIN to perform aggregations on multiple tables. Suppose our job is to calculate orders and sales information in each region. In our database, there are two tables: SalesTerritoryGroup, which contains the region, and FactInternetSales contains sales information. They are both connected using the SalesTerritoryKey. Here, we use the INNER JOIN to connect those two tables and group the data by SalesTerritoryGroup. Next, the query calculates the sum of the total number of orders, product cost, sales amount, and the sales average.

SELECT st.SalesTerritoryGroup AS Territory
    ,SUM(f.[OrderQuantity]) Orders
    ,SUM(f.TotalProductCost) ProductCost
    ,SUM(f.[SalesAmount]) Sales
    ,AVG(f.[SalesAmount]) SalesAverage
FROM [DimSalesTerritory] st INNER JOIN [FactInternetSales] f
ON st.SalesTerritoryKey = f.SalesTerritoryKey
GROUP BY SalesTerritoryGroup

Territory	Orders	ProductCost	Sales	SalesAverage
Europe	18089	5265915.9986	8930042.2634	493.6725
North America	28964	6636732.069	11367634.3729	392.4746
Pacific	13345	5375145.5081	9061000.5844	678.9809

NOTE: You can experiment with LEFT, RIGHT, and FULL JOIN to see the combinations.

SQL GROUP BY Date and Time Column

Apart from using the traditional VARCHAR or string columns in the GROUP BY clause, you can use the DATE, DATETIME, and TIME fields. For instance, calculating sales by year, month, etc.

In this example, we use the YEAR function to extract the year number from the OrderDate column. Next, we will group the FactInternetSales table by that YEAR value. Then, the SUM and AVG functions will calculate the total number of orders, total sales value, and the sales average for each from 2010 to 2014.

NOTE: Here, we used the YEAR() function. However, you can use any expression inside the GROUP BY clause. However, it should match the expression in the SELECT statement.

SELECT YEAR([OrderDate]) AS Year
    ,SUM([OrderQuantity]) TotalOrders
    ,SUM([SalesAmount]) TotalSales
    ,AVG([SalesAmount]) SalesAverage
FROM [FactInternetSales]
GROUP BY YEAR(OrderDate)
ORDER BY Year

Year        TotalOrders TotalSales            SalesAverage
----------- ----------- ------------------ ---------------------
2010        14          43421.0364            3101.5026
2011        2216        7075525.9291          3192.9268
2012        3397        5842485.1952          1719.8955
2013        52801       16351550.34           309.6825
2014        1970        45694.72              23.1952

TIP: Please replace the YEAR function with the MONTH function for monthly Sales information.

SQL GROUP BY SUM Function Example

When performing the grouping, finding the total is the most common scenario. For example, total sales by region, country, continent, department, etc. In all these scenarios, we must use the GROUP BY clause to combine the similar regions, countries, etc. Next, the SUM function helps to calculate the total sales for each group.

To demonstrate the same, we use the SUM function in a single column. The following aggregate transact structured query will calculate the total income earned by each customer education background group.

SELECT EnglishEducation
      ,SUM([YearlyIncome]) Income
FROM [DimCustomer]
GROUP BY EnglishEducation

EnglishEducation                         Income
-------------------------------- ---------------------
High School                              161570000.00
Partial High School                      62400000.00
Partial College                          279590000.00
Graduate Degree                          210780000.00
Bachelors                                344900000.00

In the previous example, to make the query simple, we used a single table to find the total income. However, in real-time, we mostly combine multiple tables to calculate total sales, orders, etc. The query below uses the SQL Server GROUP BY clause with multiple columns to combine the Territory group and country, and the SUM function to calculate the sum of the sales amount.

SELECT st.SalesTerritoryGroup AS Territory
    ,[SalesTerritoryCountry] AS Country
    ,SUM(f.[SalesAmount]) Sales
FROM [DimSalesTerritory] st INNER JOIN [FactInternetSales] f
ON st.SalesTerritoryKey = f.SalesTerritoryKey
GROUP BY SalesTerritoryGroup, SalesTerritoryCountry
ORDER BY Territory, Country

SQL GROUP BY COUNT

Similar to the above SUM function, we can use the COUNT Function with the GROUP BY clause to count the total records in each group. For example, the query below counts the total number of Male and Female customers in the DimCustomer table.

SELECT [Gender]
      ,COUNT([CustomerKey]) TotalCustomers
FROM [DimCustomer]
GROUP BY Gender

Gender TotalCustomers
------ --------------
F      9133
M      9351

Use the same technique to count the total employees in each department, products in each category, etc. If you want to count the employees at the lower level, use multiple columns. The query below counts the Female and Male customers with a combination of Single and Married status.

SELECT [Gender], MaritalStatus
      ,COUNT([CustomerKey]) TotalCustomers
FROM [DimCustomer]
GROUP BY Gender, MaritalStatus

Gender MaritalStatus TotalCustomers
------ ------------- --------------
M      M             5266
F      S             4388
F      M             4745
M      S             4085

SQL GROUP BY MAX, MIN, and AVG

It is not limited to using one aggregate function inside a SELECT statement. However, you can use a required number of aggregate functions along with the GROUP BY clause. This example shows the same.

When we use the GROUP BY clause to group similar items, finding the minimum, maximum value, and the average is one of the most common scenarios. For example, finding the Postal Codes, placing the Minimum and Maximum orders. Next, calculating the average sales value by country, etc. In all these scenarios, and not limited to, use the SQL Server GROUP BY with MIN, MAX, and AVG functions.

SELECT [SalesTerritoryCountry] AS Country
    ,MIN(f.[SalesAmount]) MinimumSale
    ,MAX(f.[SalesAmount]) MaximumSale
    ,AVG(f.[SalesAmount]) AverageSale
FROM [DimSalesTerritory] st INNER JOIN [FactInternetSales] f
ON st.SalesTerritoryKey = f.SalesTerritoryKey
GROUP BY SalesTerritoryCountry
ORDER BY Country

Country	MinimumSale	MaximumSale	AverageSale
Australia	2.29	3578.27	678.9809
Canada	2.29	3578.27	259.5596
France	2.29	3578.27	475.7138
Germany	2.29	3578.27	514.5444
United Kingdom	2.29	3578.27	491.1254
United States	2.29	3578.27	439.9264

SQL GROUP BY With Aggregate Functions

In many of the examples, we used the aggregate function to calculate the total sales by country, region, etc. This example uses the SUM, AVG, MIN, VAR, and MAX aggregate functions in a single SELECT Statement and ORDER BY. It groups customers by occupation and calculates the total income, average, minimum, maximum, standard deviation, and variance of each group.

SELECT [EnglishOccupation]
      ,SUM([YearlyIncome]) AS [Total Income]
      ,AVG([YearlyIncome]) AS [Average Income]
      ,MIN([YearlyIncome]) AS [Minimum Salary]
      ,MAX([YearlyIncome]) AS [Maximum Salary]
      ,STDEV([YearlyIncome]) AS [Standard Salary]
      ,VAR([YearlyIncome]) AS [Salary Variance]
FROM [DimCustomer]
GROUP BY [EnglishOccupation]
ORDER BY [EnglishOccupation]

NOTE: If you forgot the ALIAS Column Names, then [No Column Name] will be displayed as header text.

Using SQL GROUP BY with TOP Clause

In SQL Server, you can use the GROUP BY with the TOP clause to limit the total records displayed as the result set. Instead of selecting the first 5 or 10 records from the table, the TOP clause selects rows from the groups.

In the following statement, the GROUP BY clause groups the DimCustomers by their Occupation (5). Next, the SUM function calculates the total income by their occupation. Then, the TOP Clause selects the first 3 groups from the total of 5.

SELECT TOP 3 [EnglishOccupation]
      ,SUM([YearlyIncome]) Income
FROM [DimCustomer]
GROUP BY [EnglishOccupation]
--ORDER BY EnglishOccupation

EnglishOccupation	Income
Professional	409500000.00
Clerical	89920000.00
Manual	39220000.00

TIP: If you uncomment the ORDER BY clause, the above query returns Clerical, Manual, and Management sales information.

Using SQL GROUP BY With ORDER BY Clause

The SELECT statement with the GROUP BY clause will not follow any particular order while returning the result set. It simply uses the table order and returns it. However, if you use the ORDER BY clause and the GROUP BY combination, you can sort the result set in ascending or descending order based on the numeric or string columns.

Remember, the ORDER BY clause executes after the GROUP BY. Here, it sorts the customers by the total income in descending order.

SELECT [EnglishOccupation]
      ,SUM([YearlyIncome]) Income
FROM [DimCustomer]
GROUP BY [EnglishOccupation]
ORDER BY Income DESC

EnglishOccupation	Income
Professional	409500000.00
Management	283900000.00
Skilled Manual	236700000.00
Clerical	89920000.00
Manual	39220000.00

NOTE: The ORDER BY clause accepts the ALIAS column names, whereas the GROUP BY clause won’t allow an ALIAS name.

Using SQL GROUP BY with WHERE and HAVING Clause

As you already know, both the WHERE clause and the HAVING clause are useful to filter the selected data. When grouping data, the order of execution of WHERE and HAVING is different.

WHERE Clause

The SQL Server WHERE clause will execute before the GROUP BY clause. So, the initial filtering of the data happens well before the grouping.

In this example, we will use the GROUP BY along with the WHERE clause. Here, the WHERE clause removes the customers whose Education qualification is ‘Partial High School’. Next, it groups the customers by occupation and finds the total income.

TIP: To see the difference, please execute the query below with and without the WHERE clause.

SELECT [EnglishOccupation]
      ,SUM([YearlyIncome]) AS [Total Income]
FROM [DimCustomer]
WHERE EnglishEducation <> 'Partial High School'
GROUP BY [EnglishOccupation]

EnglishOccupation	Total Income
Professional	393610000.00
Clerical	78680000.00
Manual	30150000.00
Management	281410000.00
Skilled Manual	212990000.00

As it is one of the most important concepts, let me provide another example by joining multiple tables. Here, we will find the sales average and the total orders by region. However, we don’t need the Canada and Germany information. In such a case, we use the WHERE clause to remove the Canada and Germany information before the grouping.

SELECT st.SalesTerritoryGroup AS Territory
    ,SUM(f.[OrderQuantity]) AverageOrders
    ,AVG(f.[SalesAmount]) AverageSales
FROM [DimSalesTerritory] st INNER JOIN [FactInternetSales] f
ON st.SalesTerritoryKey = f.SalesTerritoryKey
WHERE st.SalesTerritoryCountry != 'Canada' AND st.SalesTerritoryCountry != 'Germany'
GROUP BY SalesTerritoryGroup

Territory	AverageOrders	AverageSales
Europe	12464	484.253
North America	21344	439.9264
Pacific	13345	678.9809

If you remove the WHERE clause from the above example and compare it with the result set below, you can see the sales difference in both the North America and Europe regions.

HAVING Clause

The SQL HAVING clause will execute after the GROUP BY clause. Once the grouping happens and the calculations for each group are finished, the HAVING clause will execute to filter the aggregated data.

It is the same as the above query, but we added the HAVING clause to it. This statement will combine the Customers by Occupation. Next, the SUM and AVG calculate the sum and average of the yearly income. Then, the HAVING Clause will return the result whose sum of Yearly Income is greater than 250000000.

SELECT [EnglishOccupation]
      ,SUM([YearlyIncome]) AS [Total Income]
      ,AVG([YearlyIncome]) AS [AverageIncome]
FROM [DimCustomer]
GROUP BY [EnglishOccupation]
HAVING SUM([YearlyIncome]) > 250000000

EnglishOccupation	Total Income	AverageIncome
Professional	409500000.00	74184.7826
Management	283900000.00	92325.2032

Combine the WHERE and HAVING clauses

Combining all the possibilities, such as the WHERE, HAVING, AVG, and SUM that you can use along with this SQL Server GROUP BY clause. The query execution is:

WHERE: It filters the Northeast and Southwest regions.
GROUP BY: It combines tables by Country.
SUM and AVG calculate the total order, sales and the average sales by country.
HAVING: It removes the country names whose average sales value is less than or equal to 500.

SELECT st.[SalesTerritoryCountry] AS Country
    ,SUM(f.[OrderQuantity]) AverageOrders
    ,SUM(f.[SalesAmount]) Sales
    ,AVG(f.[SalesAmount]) AverageSales
FROM [DimSalesTerritory] st INNER JOIN [FactInternetSales] f
ON st.SalesTerritoryKey = f.SalesTerritoryKey
WHERE st.SalesTerritoryRegion != 'Northeast' AND 
    st.SalesTerritoryRegion != 'Southwest'
GROUP BY st.[SalesTerritoryCountry]
HAVING AVG(f.[SalesAmount]) > 500

Country	AverageOrders	Sales	AverageSales
Australia	13345	9061000.5844	678.9809
Germany	5625	2894312.3382	514.5444

TIP: One of the common mistakes people often make is using the WHERE clause to filter the aggregated data. It’s not possible, and you must use HAVING instead of WHERE.

CTE

Apart from using the WHERE clause, you can use the CTE to perform the complex filtering and other things first. Next, use the SQL GROUP BY clause on the CTE filter data to reduce the CPU usage and improve the query performance.

The CTE query below selects the Territory groups whose orders are greater than or equal to the year 2011. Using this filtered data (>= 2011), the GROUP BY clause groups them and calculates the sum of the order quantity and the average sales.

WITH Above2011 AS (
    SELECT ds.SalesTerritoryGroup, f.SalesAmount, f.OrderQuantity 
    FROM [DimSalesTerritory] ds INNER JOIN [FactInternetSales] f
    ON ds.SalesTerritoryKey = f.SalesTerritoryKey
    WHERE YEAR(f.OrderDate) >= '2011'
)
SELECT SalesTerritoryGroup AS Territory, SUM([OrderQuantity]) AverageOrders
    ,AVG([SalesAmount]) AverageSales
FROM Above2011
GROUP BY SalesTerritoryGroup

Territory	AverageOrders	AverageSales
Europe	18087	493.5004
North America	28958	391.92
Pacific	13339	677.7187

Using GROUP BY and subqueries

The SQL Server GROUP BY clause also allows you to use it inside a subquery to perform complex operations. The subquery inside the example below groups customers by occupation and finds the average salary. The outer query checks whether the subquery result is greater than the average salary of the total customers. If true, return that row.

SELECT Occupation, Avg_Income
FROM (SELECT [EnglishOccupation] Occupation
      ,AVG([YearlyIncome]) Avg_Income
FROM [DimCustomer]
GROUP BY [EnglishOccupation] ) New
WHERE Avg_Income > (SELECT AVG([YearlyIncome]) FROM [DimCustomer])

Occupation	Avg_Income
Professional	74184.7826
Management	92325.2032

Using the CASE WHEN Statement

The SQL Server GROUP BY clause, combined with the CASE statement, provides greater flexibility for performing more advanced operations. For example, finding the total number of employees with the highest sales and the lowest sales.

The query below uses the CASE statement to check whether the SalesAmount is greater than 2500 or less than or equal to 2500. Additionally, we used the COUNT function to count the total sales in each country that exceeded 2500. Next, count the users less than or equal to 2500. Please check the first image to see the output.

SELECT st.[SalesTerritoryCountry] AS Country
    ,COUNT(CASE WHEN f.SalesAmount > 2500 THEN 1 END) AS HighSale
    ,COUNT(CASE WHEN f.SalesAmount <= 2500 THEN 1 END) AS LowSale
FROM [DimSalesTerritory] st INNER JOIN [FactInternetSales] f
ON st.SalesTerritoryKey = f.SalesTerritoryKey
GROUP BY st.[SalesTerritoryCountry]

Handling NULL values

When working with the SQL GROUP BY clause, it considers NULL values as a separate group and displays them in the result set. To avoid this, use the COALESCE function. If you would like to address the NULL values in one of the above queries (there are none, if any), please use the code below.

SELECT COALESCE([Gender], 'Unknown') AS Gender
      ,SUM([YearlyIncome]) TotalIncome
FROM [DimCustomer]
GROUP BY COALESCE([Gender], 'Unknown')

SQL GROUP BY Extensions: GROUPING SETS, ROLLUP, and CUBE

The following are the available three GROUP BY extensions that you can use to see the summarised aggregated data.

GROUPING SETS: It displays the aggregated data for individual items.
ROLLUP: It displays the aggregated data for each combination, plus subtotals and grand totals.
CUBE: It is the same as the ROLLUP, but it shows the subtotals for all combinations.

GROUPING SETS

To demonstrate this functionality, we use the same example that we explained in the multiple columns section. Here, we group the customers by Gender and MaritalStatus.

SELECT [Gender], MaritalStatus
      ,SUM([YearlyIncome]) TotalIncome
FROM [DimCustomer]
GROUP BY GROUPING SETS(Gender, MaritalStatus)
ORDER BY Gender

If you compare the result set of GROUPING SETS against the multiple columns example, they are entirely different. The multiple columns example calculates the total income for each combination, whereas the GROUPING SETS calculates the total income for individuals. It is not a combination of two columns but a common grouping of the same data within a single column.

SET 1: It groups by MaritalStatus and calculates the total income for single and married persons. Next, set the Gender as NULL.
SET 2: It groups by Gender and calculates the total income for males and females. Next, set the MaritalStatus as NULL.

ROLLUP

Like the regular one, if you use the SQL GROUP BY ROLLUP, it groups the table by the given item and performs the aggregations. It also provides subtotals and grand totals based on the columns. To explain this, we will use two examples, one with a single column and the other with two columns.

The query below groups customers by gender and calculates the total income. As there are Male and Female customers, it should return two rows. However, the GROUP BY ROLLUP adds one more row with NULL as Gender, and the total income is the sum of Male and Female customers.

SELECT [Gender]
      ,SUM([YearlyIncome]) TotalIncome
FROM [DimCustomer]
GROUP BY ROLLUP(Gender)
ORDER BY TotalIncome

If you add multiple (two) columns in the SQL GROUP BY ROLLUP, the output will produce the subtotals and grand totals.

Female (F)- Married (M).
Female (F) – Single (S).
Female Subtotal (M + S)
Male (M) – Married (M).
Male (M) – Single (S).
Male (M) Subtotal ( M + S)
Grand Total (All combinations of Male + Female)

SELECT [Gender], MaritalStatus
      ,SUM([YearlyIncome]) TotalIncome
FROM [DimCustomer]
GROUP BY ROLLUP(Gender, MaritalStatus)
--ORDER BY TotalIncome

CUBE

If you use the SQL GROUP BY CUBE with a single column, the output will be the same as the above example. However, if you use the CUBE with multiple columns, it displays all combinations. The ROLLUP will consider the first column (Gender) as the main and display the sub and grand totals. On the other hand, the CUBE will display subtotals and grand totals for the MaritalStatus.

To demonstrate the same, we use the same example and replace the ROOLUP with GROUP BY CUBE. When you look at the result set, it displays 9 records, which include all seven records that we explained previously, plus the two below.

Subtotal of Single Persons.
Subtotal of Married Persons.

SELECT [Gender], MaritalStatus
      ,SUM([YearlyIncome]) TotalIncome
FROM [DimCustomer]
GROUP BY CUBE(Gender, MaritalStatus)

TIP: You can also use the combination of two or three extensions. For instance, ROLLUP and GROUPING SETS, or combining them with CUBE.