Hi Tom,

 

Possible example:

 

DROP TABLE IF EXISTS #TestData;

GO

 

CREATE TABLE #TestData

(

    ID int NOT NULL,

    [Name] varchar(20) NOT NULL,

    [Description] varchar(30) NOT NULL,

    SomeDate date NOT NULL,

    Etc varchar(20) NOT NULL

);

GO

 

INSERT #TestData

(

    ID, [Name], [Description], SomeDate, Etc

)

SELECT ID, [Name], [Description], SomeDate, Etc

FROM (VALUES

      (1,'abc','abc abc','20221117','a'),

      (2,'abc','abc abc','20221117','a'),

      (5,'def','def def','20221117','a'),

      (4,'abc','abc abc','20221117','a'),

      (3,'def','def def','20221117','a'),

      (6,'xyz','def def','20221117','a')

     ) AS v(ID, [Name], [Description], SomeDate, Etc);

GO

 

SELECT * FROM #TestData;

 

WITH OrderedRows

AS

(

    SELECT ID, [Name], [Description], SomeDate, Etc,

           ROW_NUMBER() OVER(PARTITION BY [Name], [Description], SomeDate, Etc ORDER BY ID DESC) AS CopyNumber

    FROM #TestData

)

SELECT ID, [Name], [Description], SomeDate, Etc

FROM OrderedRows

WHERE CopyNumber > 1

ORDER BY ID;

 

DROP TABLE IF EXISTS #TestData;

 

Regards,

 

Greg

 

Dr Greg Low

 

1300SQLSQL (1300 775 775) office | +61 419201410 mobile

SQL Down Under | Web: https://sqldownunder.com | About Greg:  https://about.me/greg.low

 

From: Tom P via ozdotnet <ozdotnet@ozdotnet.com>
Sent: Thursday, 17 November 2022 5:01 PM
To: ozDotNet <ozdotnet@ozdotnet.com>
Cc: Tom P <tompbiz93@gmail.com>
Subject: [OT] Finding duplicate rows in Sql Server

 

Apologies if this is basic for probably most of you but I just can't get my head around it.

 

I have a flat table in sql server which contains lots of duplicates, differing only by one column.

 

Id,Name,Desc,Date,Etc

1,abc,abc abc,2022-11-17,a

2,abc,abc abc,2022-11-17,a

5,def,def def,2022-11-17,a

4,abc,abc abc,2022-11-17,a

3,def,def def,2022-11-17,a

6,xyz,def def,2022-11-17,a

 

I'm trying to write a query that finds all duplicates excluding the ones with the highest Id. So for the above example it would return the following:

 

Id,Name,Desc,Date,Etc

1,abc,abc abc,2022-11-17,a

2,abc,abc abc,2022-11-17,a

3,def,def def,2022-11-17,a

 

There are many millions of rows to process so looking for something efficient. Any advice would be appreciated.

 

Regards

Tom