How to write a query to Delete Duplicate Rows in SQL Server is one of the common Interview Question that you might face in the interviews.
For this SQL Server delete duplicate records example, We are going to use the below-shown data (few columns from Fact Internet Sales in Adventure Works DW)
USE [AdventureWorksDW2014] GO SELECT [ProductKey] ,[OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] FROM [FactInternetSales] ORDER BY [ProductKey]
In order to keep this SQL example as simple as possible, we created a new table, and then inserted the above data into new table
USE [SQL Tutorial] GO CREATE TABLE [dbo].[DupFactInternetSales]( [FactID] [int] IDENTITY(1,1) NOT NULL, [ProductKey] [int] NOT NULL, [OrderQuantity] [smallint] NOT NULL, [UnitPrice] [money] NOT NULL, [ExtendedAmount] [money] NOT NULL, [DiscountAmount] [float] NOT NULL, [ProductStandardCost] [money] NOT NULL, [TotalProductCost] [money] NOT NULL, [SalesAmount] [money] NOT NULL, [TaxAmt] [money] NOT NULL ) ON [PRIMARY] GO
The below screenshot will show you the data that we inserted into the DupFactInternetSales table present in the SQL Tutorial database.
Let me show you the unique records present in the SQL Server table. As you can see, we used the SELECT DISTINCT keyword
Delete Duplicate Rows in SQL Server Example 1
In this example, we show you how to delete duplicate rows in SQL Server using the ROW_NUMBER function and the Common Table Expression.
-- Query to Remove Duplicate Rows in SQL Server USE [SQL Tutorial] GO WITH RemoveDuplicate AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY [ProductKey] ,[OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] ORDER BY [ProductKey] ) UniqueRowNumber FROM [DupFactInternetSales]) DELETE FROM RemoveDuplicate WHERE UniqueRowNumber > 1;
Within the CTE, we are using the Rank function called ROW_NUMBER. It will assign a unique rank number from 1 to n. Next, we are deleting all the records whose rank number is greater than 1.
OUTPUT: Let me show you the output of the statement
Let us see the data present in the DupFactInternetsales, after the Delete Operation
Remove Duplicate Rows in SQL Server Example 2
In this Frequently asked Question, we show how to remove duplicate rows in SQL Server using the SELF JOIN.
-- Query to Remove Duplicate Rows in SQL Server USE [SQL Tutorial] GO DELETE InternetSales FROM [DupFactInternetSales] InternetSales, [DupFactInternetSales] FactInternetSales WHERE InternetSales.[ProductKey] = FactInternetSales.[ProductKey] AND InternetSales.[OrderQuantity] = FactInternetSales.[OrderQuantity] AND InternetSales.[UnitPrice] = FactInternetSales.[UnitPrice] AND InternetSales.[ExtendedAmount] = FactInternetSales.[ExtendedAmount] AND InternetSales.[DiscountAmount] = FactInternetSales.[DiscountAmount] AND InternetSales.[ProductStandardCost]= FactInternetSales.[ProductStandardCost] AND InternetSales.[SalesAmount] = FactInternetSales.[SalesAmount] AND InternetSales.[TaxAmt] = FactInternetSales.[TaxAmt] AND InternetSales.FactID > FactInternetSales.FactID
Let me show you the output of the statement
OUTPUT
Remove Duplicate Rows in SQL Server Example 3
This example shows how to delete Duplicate rows in SQL Server using the LEFT JOIN, MIN Function, GROUP BY, and HAVING.
-- Query Remove Duplicate Rows in SQL Server USE [SQL Tutorial] GO DELETE DupFactInternetSales FROM DupFactInternetSales LEFT JOIN ( SELECT MIN([FactID]) AS [FactID] ,[ProductKey] ,[OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] FROM [DupFactInternetSales] GROUP BY [ProductKey], [OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] ) AS InternetSales ON [DupFactInternetSales].[FactID] = InternetSales.[FactID] WHERE InternetSales.[FactID] IS NULL
Let me show you the output of the statement