[OT] Finding duplicate rows in Sql Server

Apologies if this is basic for probably most of you but I just can't get my head around it. I have a flat table in sql server which contains lots of duplicates, differing only by one column. Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 5,def,def def,2022-11-17,a 4,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a 6,xyz,def def,2022-11-17,a I'm trying to write a query that finds all duplicates *excluding the ones with the highest Id*. So for the above example it would return the following: Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a There are many millions of rows to process so looking for something efficient. Any advice would be appreciated. Regards Tom

Hi Tom, Possible example: DROP TABLE IF EXISTS #TestData; GO CREATE TABLE #TestData ( ID int NOT NULL, [Name] varchar(20) NOT NULL, [Description] varchar(30) NOT NULL, SomeDate date NOT NULL, Etc varchar(20) NOT NULL ); GO INSERT #TestData ( ID, [Name], [Description], SomeDate, Etc ) SELECT ID, [Name], [Description], SomeDate, Etc FROM (VALUES (1,'abc','abc abc','20221117','a'), (2,'abc','abc abc','20221117','a'), (5,'def','def def','20221117','a'), (4,'abc','abc abc','20221117','a'), (3,'def','def def','20221117','a'), (6,'xyz','def def','20221117','a') ) AS v(ID, [Name], [Description], SomeDate, Etc); GO SELECT * FROM #TestData; WITH OrderedRows AS ( SELECT ID, [Name], [Description], SomeDate, Etc, ROW_NUMBER() OVER(PARTITION BY [Name], [Description], SomeDate, Etc ORDER BY ID DESC) AS CopyNumber FROM #TestData ) SELECT ID, [Name], [Description], SomeDate, Etc FROM OrderedRows WHERE CopyNumber > 1 ORDER BY ID; DROP TABLE IF EXISTS #TestData; Regards, Greg Dr Greg Low 1300SQLSQL (1300 775 775) office | +61 419201410 mobile SQL Down Under | Web: https://sqldownunder.com<https://sqldownunder.com/> | About Greg: https://about.me/greg.low From: Tom P via ozdotnet <ozdotnet@ozdotnet.com> Sent: Thursday, 17 November 2022 5:01 PM To: ozDotNet <ozdotnet@ozdotnet.com> Cc: Tom P <tompbiz93@gmail.com> Subject: [OT] Finding duplicate rows in Sql Server Apologies if this is basic for probably most of you but I just can't get my head around it. I have a flat table in sql server which contains lots of duplicates, differing only by one column. Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 5,def,def def,2022-11-17,a 4,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a 6,xyz,def def,2022-11-17,a I'm trying to write a query that finds all duplicates excluding the ones with the highest Id. So for the above example it would return the following: Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a There are many millions of rows to process so looking for something efficient. Any advice would be appreciated. Regards Tom

If the ID is unique across all records, SELECT * FROM <tablename> WHERE ID NOT IN (SELECT MAX(ID) FROM <tablename>GROUP BY NAme,Desc,Date,Etc) On Thu, 17 Nov 2022 at 16:02, Tom P via ozdotnet <ozdotnet@ozdotnet.com> wrote:
Apologies if this is basic for probably most of you but I just can't get my head around it.
I have a flat table in sql server which contains lots of duplicates, differing only by one column.
Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 5,def,def def,2022-11-17,a 4,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a 6,xyz,def def,2022-11-17,a
I'm trying to write a query that finds all duplicates *excluding the ones with the highest Id*. So for the above example it would return the following:
Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a
There are many millions of rows to process so looking for something efficient. Any advice would be appreciated.
Regards Tom
-- ozdotnet mailing list To manage your subscription, access archives: https://codify.mailman3.com/
-- Alan Ingleby

Yikes looks so simple once the answer is here haha. I’ll check in the morning if the Id is unique across the board. Thanks I appreciate the help. Regards Tom On Thu, 17 Nov 2022 at 17:27, Alan Ingleby via ozdotnet < ozdotnet@ozdotnet.com> wrote:
If the ID is unique across all records,
SELECT * FROM <tablename> WHERE ID NOT IN (SELECT MAX(ID) FROM <tablename>GROUP BY NAme,Desc,Date,Etc)
On Thu, 17 Nov 2022 at 16:02, Tom P via ozdotnet <ozdotnet@ozdotnet.com> wrote:
Apologies if this is basic for probably most of you but I just can't get my head around it.
I have a flat table in sql server which contains lots of duplicates, differing only by one column.
Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 5,def,def def,2022-11-17,a 4,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a 6,xyz,def def,2022-11-17,a
I'm trying to write a query that finds all duplicates *excluding the ones with the highest Id*. So for the above example it would return the following:
Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a
There are many millions of rows to process so looking for something efficient. Any advice would be appreciated.
Regards Tom
-- ozdotnet mailing list To manage your subscription, access archives: https://codify.mailman3.com/
-- Alan Ingleby -- ozdotnet mailing list To manage your subscription, access archives: https://codify.mailman3.com/
-- Thanks Tom

Also be aware that there's a gotcha when using WHERE abc NOT IN (SELECT xyz FROM ...) that has bitten me a few times in the past... 😁 Perhaps not an issue in this scenario because it's an ID and likely NOT NULL, but if the select subquery ever returns a NULL value for xyz the condition evaluates to false and may give you unexpected results. WHERE abc IN (SELECT xyz FROM ...) doesn't have the same problem, it effectively ignores the NULLs. cheers, Tony On 17/11/2022 16:24, Alan Ingleby via ozdotnet wrote:
If the ID is unique across all records,
SELECT * FROM <tablename> WHERE ID NOT IN (SELECT MAX(ID) FROM <tablename>GROUP BY NAme,Desc,Date,Etc)
On Thu, 17 Nov 2022 at 16:02, Tom P via ozdotnet <ozdotnet@ozdotnet.com> wrote:
Apologies if this is basic for probably most of you but I just can't get my head around it.
I have a flat table in sql server which contains lots of duplicates, differing only by one column.
Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 5,def,def def,2022-11-17,a 4,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a 6,xyz,def def,2022-11-17,a
I'm trying to write a query that finds all duplicates _excluding the ones with the highest Id_. So for the above example it would return the following:
Id,Name,Desc,Date,Etc 1,abc,abc abc,2022-11-17,a 2,abc,abc abc,2022-11-17,a 3,def,def def,2022-11-17,a
There are many millions of rows to process so looking for something efficient. Any advice would be appreciated.
Regards Tom
-- ozdotnet mailing list To manage your subscription, access archives: https://codify.mailman3.com/
-- Alan Ingleby
participants (4)
-
Alan Ingleby
-
Dr Greg Low
-
Tom P
-
Tony McGee