Delete Duplicate Rows in SQL Server

How to write a query to remove or delete Duplicate Rows in SQL Server is one of the common interview questions you might face.

For this SQL Server delete duplicate rows or records example, we are going to use the below-shown data (a few columns from Fact Internet Sales in Adventure Works DW to remove)

USE [AdventureWorksDW2014]
GO
SELECT [ProductKey]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
FROM [FactInternetSales]
ORDER BY [ProductKey]

In order to keep this example as simple as possible, we created a new table and then inserted the above data into the new table.

CREATE TABLE [dbo].[DupFactInternetSales](
	[FactID] [int] IDENTITY(1,1) NOT NULL,
	[ProductKey] [int] NOT NULL,
	[OrderQuantity] [smallint] NOT NULL,
	[UnitPrice] [money] NOT NULL,
	[ExtendedAmount] [money] NOT NULL,
	[DiscountAmount] [float] NOT NULL,
	[ProductStandardCost] [money] NOT NULL,
	[TotalProductCost] [money] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[TaxAmt] [money] NOT NULL
) ON [PRIMARY]

GO

The screenshot below will show you the data that we inserted into the DupFactInternetSales table in the database.

Insert records 1

Let me show you the unique records present in the SQL Server table. As you can see, we used the SELECT DISTINCT keyword.

Select Statement to View Records 2

Delete Duplicate Rows in SQL Server using ROW_NUMER and CTE

In this example, we show you how to remove duplicate rows using the ROW_NUMBER function and the Common Table Expression.

WITH RemoveDuplicate
AS (
     SELECT ROW_NUMBER() 
      OVER (
	    PARTITION BY [ProductKey]
			,[OrderQuantity]
			,[UnitPrice]
			,[ExtendedAmount]
			,[DiscountAmount]
			,[ProductStandardCost]
			,[TotalProductCost]
			,[SalesAmount]
			,[TaxAmt]
	   ORDER BY [ProductKey]
          ) UniqueRowNumber
    FROM [DupFactInternetSales])

DELETE FROM RemoveDuplicate
WHERE  UniqueRowNumber > 1;

Within the CTE, we are using the Rank function called ROW_NUMBER. It will assign a unique rank number from 1 to n. Next, we delete all the records whose rank number is greater than 1.

OUTPUT: Let me show you the output of the statement

Messages
--------
(60240 row(s) affected)

Let us see the data present in the DupFactInternetsales, after the Delete Operation

remove or Delete Duplicate Rows in SQL Server 4

Delete duplicate rows in SQL Server using self Join

In this Frequently Asked Question, we show how to remove duplicate rows using the SELF JOIN.

DELETE InternetSales
FROM   [DupFactInternetSales] InternetSales,
       [DupFactInternetSales] FactInternetSales
WHERE   InternetSales.[ProductKey] = FactInternetSales.[ProductKey] AND
	InternetSales.[OrderQuantity] = FactInternetSales.[OrderQuantity] AND
	InternetSales.[UnitPrice] = FactInternetSales.[UnitPrice] AND
	InternetSales.[ExtendedAmount] = FactInternetSales.[ExtendedAmount] AND 
	InternetSales.[DiscountAmount] = FactInternetSales.[DiscountAmount] AND 
	InternetSales.[ProductStandardCost]= FactInternetSales.[ProductStandardCost] AND 
	InternetSales.[SalesAmount] = FactInternetSales.[SalesAmount] AND 
	InternetSales.[TaxAmt] = FactInternetSales.[TaxAmt] AND 
	InternetSales.FactID > FactInternetSales.FactID

Let me show you the output of the statement

Messages
--------
(60240 row(s) affected)

OUTPUT

Delete Duplicate Rows in SQL Server Example 6

Delete duplicate rows in SQL Server using Group By Having Clause

This example shows how to delete Duplicate rows using the LEFT JOIN, MIN Function, GROUP BY, and HAVING.

DELETE DupFactInternetSales
 FROM DupFactInternetSales
      LEFT JOIN (
               SELECT MIN([FactID]) AS [FactID] 
                    ,[ProductKey]
		    ,[OrderQuantity]
                    ,[UnitPrice]
                    ,[ExtendedAmount]
                    ,[DiscountAmount]
                    ,[ProductStandardCost]
                    ,[TotalProductCost]
                    ,[SalesAmount]
                    ,[TaxAmt]
               FROM [DupFactInternetSales]
               GROUP BY [ProductKey],
		        [OrderQuantity]
                       ,[UnitPrice]
                       ,[ExtendedAmount]
                       ,[DiscountAmount]
                       ,[ProductStandardCost]
                       ,[TotalProductCost]
                       ,[SalesAmount]
                       ,[TaxAmt]
        ) AS InternetSales ON
        [DupFactInternetSales].[FactID] = InternetSales.[FactID]
        WHERE InternetSales.[FactID] IS NULL

Let me show you the output of the statement.

Messages
--------
(60240 row(s) affected)
Categories SQL