Delete Duplicate Rows in SQL Server

How to write a query to Delete Duplicate Rows in SQL Server is one of the common Interview Question that you might face in the interviews.

For this SQL Server delete duplicate records example, We are going to use the below-shown data (few columns from Fact Internet Sales in Adventure Works DW)

USE [AdventureWorksDW2014]
GO
SELECT [ProductKey]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
FROM [FactInternetSales]
ORDER BY [ProductKey]

In order to keep this SQL example as simple as possible, we created a new table, and then inserted the above data into new table

USE [SQL Tutorial]
GO

CREATE TABLE [dbo].[DupFactInternetSales](
	[FactID] [int] IDENTITY(1,1) NOT NULL,
	[ProductKey] [int] NOT NULL,
	[OrderQuantity] [smallint] NOT NULL,
	[UnitPrice] [money] NOT NULL,
	[ExtendedAmount] [money] NOT NULL,
	[DiscountAmount] [float] NOT NULL,
	[ProductStandardCost] [money] NOT NULL,
	[TotalProductCost] [money] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[TaxAmt] [money] NOT NULL
) ON [PRIMARY]

GO

The below screenshot will show you the data that we inserted into the DupFactInternetSales table present in the SQL Tutorial database.

Delete Duplicate Rows in SQL Server 1

Let me show you the unique records present in the SQL Server table. As you can see, we used the SELECT DISTINCT keyword

Delete Duplicate Rows in SQL Server 2

Delete Duplicate Rows in SQL Server Example 1

In this example, we show you how to delete duplicate rows in SQL Server using the ROW_NUMBER function and the Common Table Expression.

-- Query to Remove Duplicate Rows in SQL Server
USE [SQL Tutorial]
GO
WITH RemoveDuplicate
AS (
     SELECT ROW_NUMBER() 
      OVER (
	    PARTITION BY [ProductKey]
			,[OrderQuantity]
			,[UnitPrice]
			,[ExtendedAmount]
			,[DiscountAmount]
			,[ProductStandardCost]
			,[TotalProductCost]
			,[SalesAmount]
			,[TaxAmt]
	   ORDER BY [ProductKey]
          ) UniqueRowNumber
    FROM [DupFactInternetSales])

DELETE FROM RemoveDuplicate
WHERE  UniqueRowNumber > 1;

Within the CTE, we are using the Rank function called ROW_NUMBER. It will assign a unique rank number from 1 to n. Next, we are deleting all the records whose rank number is greater than 1.

OUTPUT: Let me show you the output of the statement

Messages
--------
(60240 row(s) affected)

Let us see the data present in the DupFactInternetsales, after the Delete Operation

Delete Duplicate Rows in SQL Server 4

Remove Duplicate Rows in SQL Server Example 2

In this Frequently asked Question, we show how to remove duplicate rows in SQL Server using the SELF JOIN.

-- Query to Remove Duplicate Rows in SQL Server
USE [SQL Tutorial]
GO
DELETE InternetSales
FROM   [DupFactInternetSales] InternetSales,
       [DupFactInternetSales] FactInternetSales
WHERE   InternetSales.[ProductKey] = FactInternetSales.[ProductKey] AND
	InternetSales.[OrderQuantity] = FactInternetSales.[OrderQuantity] AND
	InternetSales.[UnitPrice] = FactInternetSales.[UnitPrice] AND
	InternetSales.[ExtendedAmount] = FactInternetSales.[ExtendedAmount] AND 
	InternetSales.[DiscountAmount] = FactInternetSales.[DiscountAmount] AND 
	InternetSales.[ProductStandardCost]= FactInternetSales.[ProductStandardCost] AND 
	InternetSales.[SalesAmount] = FactInternetSales.[SalesAmount] AND 
	InternetSales.[TaxAmt] = FactInternetSales.[TaxAmt] AND 
	InternetSales.FactID > FactInternetSales.FactID

Let me show you the output of the statement

Messages
--------
(60240 row(s) affected)

OUTPUT

Delete Duplicate Rows in SQL Server Example 6

Remove Duplicate Rows in SQL Server Example 3

This example shows how to delete Duplicate rows in SQL Server using the LEFT JOIN, MIN Function, GROUP BY, and HAVING.

-- Query Remove Duplicate Rows in SQL Server
USE [SQL Tutorial]
GO
DELETE DupFactInternetSales
 FROM DupFactInternetSales
      LEFT JOIN (
               SELECT MIN([FactID]) AS [FactID] 
                    ,[ProductKey]
		    ,[OrderQuantity]
                    ,[UnitPrice]
                    ,[ExtendedAmount]
                    ,[DiscountAmount]
                    ,[ProductStandardCost]
                    ,[TotalProductCost]
                    ,[SalesAmount]
                    ,[TaxAmt]
               FROM [DupFactInternetSales]
               GROUP BY [ProductKey],
		        [OrderQuantity]
                       ,[UnitPrice]
                       ,[ExtendedAmount]
                       ,[DiscountAmount]
                       ,[ProductStandardCost]
                       ,[TotalProductCost]
                       ,[SalesAmount]
                       ,[TaxAmt]
        ) AS InternetSales ON
        [DupFactInternetSales].[FactID] = InternetSales.[FactID]
        WHERE InternetSales.[FactID] IS NULL

Let me show you the output of the statement

Messages
--------
(60240 row(s) affected)