The SQL Server LAG function is one of the Analytic functions that acts precisely opposite to LEAD to perform advanced data analytics. It allows you to access the data or value from a previous row without using any SELF JOIN. This LAG function is beneficial in comparing the current row value with its previous row so that you can calculate the difference between two periods or track down the sales trend in the dataset, etc.
The values returned by the LAG Function depend on the ORDER BY clause. The ORDER BY clause decides the sorting of numeric values, and the LAG function retrieves the previous row in the same order. This article will explore the SQL Server LAG function with or without the PARTITION BY clause, Default and Offset Values.
SQL LAG Function Syntax
The basic syntax of the LAG function to access the previous row is as shown below:
SELECT LAG([Scalar Expression], [Offset_Value], [Default_Value]) OVER ( PARTITION_BY_Clause ORDER_BY_Clause ) FROM [Source]
Let us explore the Parameters of the SQL Server LAG Function.
- Scalar Expression: It can be a Column Name, expression, or Subquery from which you want to retrieve the previous value. It returns a single value.
- Offset_Value (Optional): Please define the number of rows you want to move or look backward from the current row. The default value = 1, so the SQL LAG function retrieves the previous row value. For example, if the Offset is 3, it looks back at three rows and will select the 3rd previous row (-3) value as a result set. Use a Column, Subquery, or any expression that returns a single value.
- Default_Value (Optional): It allows you to define the default value and accepts a constant value or an expression. This default value will return if the previous row value is NULL or the Offset_Value is not in the range. If you ignore it, NULL will return.
- Order_By_Clause: Please specify the column name to sort the partitioned data in ascending or descending order. Please refer to the ORDER BY Clause in SQL Server.
- Partition_By_Clause: It separates the selected records result set into partitions based on the given column. For instance, partition employees by department name.
- If you specify the Partition By Clause, the SQL Server LAG function works independently within each partition. It starts picking the previous rows in each partition and does the same for other partitions.
- If the partition By clause is not specified, the LAG function will consider all table rows as a single partition.
NOTE: If the scalar_expression is NULL or its default value is set to NULL, the LAG function returns NULL as an output.
SQL Server LAG function Examples
The LAG function is very useful to compare the current value against the previous value. For instance, comparing the customer’s current spending with last time side by side. The other one is finding the sales of a product this year vs the previous period or previous month, day, etc.
The following list of examples will explore the working functionality of the SQL LAG function to retrieve previous rows. It starts with no default values, then what happens when we ignore the Partition By Clause. Next, examples with default and offset values.
SQL LAG Function without a Default Value
The query below is a simple and basic example of the LAG function. We did not use the Offset, Default Value, and PARTITION BY Clause to retrieve the previous sales. As we didn’t use the PARTITION BY, the LAG function considers the whole table a single partition.
The ORDER BY [YearlyIncome] statement sorts the Employee records in ascending order based on their Yearly Income. Next, LAG([Sales]) will return the previous Sales value of the row (record before) as the output. If there are no rows to return, the SQL Server LAG function will return a NULL value because we haven’t set any default value.
SELECT [EmpID],[Name],[Education],[Occupation]
,[YearlyIncome],[Year],[Sales]
,LAG([Sales]) OVER (ORDER BY [YearlyIncome]) AS Out1
FROM [emp]
As you can see, the LAG function returned NULL as output for the 1st record. It’s because there is no previous row for that record (it is the first record, indeed). The second record of the Out1 column is the first-row value of the Sales column.
SQL Server LAG Function with a default value
The LAG function provides flexibility in specifying a default value that is returned when no data is present in the previous row. If you observe the image below, the first row of the Out1 column is NULL. So, let me define the Offset value = 1 and the default value = 0.
SELECT [EmpID],[Name],[Education],[Occupation]
,[YearlyIncome],[Year],[Sales]
,LAG([Sales], 1, 0) OVER (ORDER BY [YearlyIncome]) AS Out2
FROM [emp]
You can see that the LAG function replaces the NULL value in the Out2 column with 0. Please refer to the “LEAD” and “SELF JOIN” articles.
Handling NULL Values
The SQL Server LAG function later introduced two keywords, IGNORE NULLS and RESPECT NULLS, to handle the null values in the given column.
The RESPECT NULLS is the default one, so it is optional to use this keyword. With or without this keyword, the LAG function considers NULL as the undefined value and returns NULL as the output. To demonstrate the same, edit the above table and add one or two NULLs in the sales column and try the below codes.
LAG([Sales]) IGNORE NULLS OVER (ORDER BY [Year]) AS SaleCompare
Whereas the IGNORE NULLS ignores the previous row NULL and returns the most recent NOT NULL value.
LAG([Sales]) RESPECT NULLS OVER (ORDER BY [Year]) AS SaleCompare
SQL LAG Function with PARTITION BY Clause
How to write the previous values from the partitioned records is one of the common questions you may face in interviews. The PARTITION BY clause divides the table into multiple partitions based on a specified column. It basically groups the similar items.
The following query partitions the Employee table data by Occupation, which means each unique occupation type is a separate partition. The ORDER BY [Year] line sorts each partition using their employee’s joining year. Then, the LAG function returns previous Sales values in each partition (employee occupation). The server will reset the LAG value when it goes to the subsequent Occupation.
SELECT [Name],[Education],[Occupation]
,[YearlyIncome],Year,[Sales]
,LAG([Sales], 1, 0) OVER (
PARTITION BY [Occupation] ORDER BY [Year]
) AS Out1
FROM [emp]
If you see, for the 1st, 5th, 9th, and 13th records, the LAG function is assigned 0 because there are no previous records within that partition. If you remove the default value, those 0s will be replaced by NULLs.
NOTE: You can also use multiple columns in the PARTITION BY clause and the ORDER BY clause. For instance, LAG([Sales], 1, 0) OVER ( PARTITION BY Education, Occupation ORDER BY [Year], INCOME ) AS Out1. It helps to compare a large set of data by further dividing the partitions.
SQL LAG Function with Offset Value
In the above statement, we used the LAG function with a PARTITION BY clause, and it selects the previous rows within each partition. It is because, by default, the LAG function returns the value from the previous row. However, you can also specify the offset value.
Let us see what happens when we change the LAG Function offset value from 1 to 2 and the default value to 100.
SELECT [Name],[Education],[Occupation]
,[YearlyIncome],Year,[Sales]
,LAG([Sales], 2, 100) OVER (
PARTITION BY [Occupation] ORDER BY [Year]
) AS Out2
FROM [emp]
As you can see, we used an offset of 2, and this LAG function selects the 2nd row (2 rows ahead) rather than the previous row. Modifying the offset value to 3 will look three rows back and select the last third row (-3rd row) by jumping the last two rows, etc.
How to Find the Sales Difference?
Instead of simply displaying the current sales against the previous ones, you can use the SQL Server LAG function to find the sales difference between two periods (here, the current year and last year). To achieve the same, you must subtract the LAG(Sales) output from the actual sales. So, please add the code below to the first example to see the result.
Sales - LAG([Sales]) OVER (
PARTITION BY [Occupation] ORDER BY [Year]) AS sales_difference
How to Calculate Year-on-Year Growth?
We can also achieve the same result using the subquery. Here, the subquery will use the LAG function to find the previous sales grouped by occupation. The main query finds the difference.
SELECT e.*, (e.Sales - PreSale) AS Growth
FROM (
SELECT [Name],[Education],[Occupation]
,[YearlyIncome],Year,[Sales],
LAG([Sales]) OVER (
PARTITION BY [Occupation] ORDER BY [Year]) AS PreSale FROM emp
) e
ORDER BY Occupation, Year
To get the percentage of difference, add the code below before the FROM clause. I mean, as a second line.
CONCAT(ROUND(( e.Sales - PreSale) * 100 /PreSale,0),'%') 'Sales%'
Using the SQL LAG function with the CASE statement
Apart from the above subquery and the below-mentioned CTE, you can use the CASE statement along with the LAG function. You can directly use the LAG function within the CASE WHEN, but for the sake of readability and reusability, we inherited the subquery concept from the example above. Here, when the current sale is greater than the previous sale, increase. Otherwise, decrease.
SELECT e.*, (e.Sales - PreSale) AS Growth,
CASE
WHEN e.Sales > PreSale THEN 'Sales Increase'
WHEN e.Sales < PreSale THEN 'Sales Drop'
ELSE 'No Change'
END AS SalesChange
FROM (
SELECT [Name],[Education],[Occupation]
,[YearlyIncome],Year,[Sales],
LAG([Sales]) OVER (
PARTITION BY [Occupation] ORDER BY [Year]) AS PreSale FROM emp
) e
ORDER BY Occupation, Year
Combining the LAG Function with Other Window Functions
The SQL Server allows you to combine the LAG function with the remaining window functions and aggregate functions to perform complex calculations. To demonstrate the same, we use the Common Table Expression (CTE).
WITH SaleData AS
(
SELECT Year,
SUM(Sales) AS Sales,
LAG(SUM(Sales)) OVER(ORDER BY Year) AS Previous_Sales
FROM emp
GROUP BY Year
)
SELECT Year, Sales, Previous_Sales, Sales - Previous_Sales AS SalesDiff,
CONCAT(ROUND(100 * (Sales - Previous_Sales) / Previous_Sales, 0), '%') AS Sales_Percent
FROM SaleData
ORDER BY Year
Comments are closed.