I was just recently working with a client that was moving a few 100k rows of data on a daily basis from one system to another and all this data was incremental in the sense that the source data was a ticket system and there was an incrementing ID on all the sales records. Now the problem was that for various reasons not all data had been transferred in the past, a few sales records had been deleted because of void transactions (these were not so many), others were simply missing (this was a substantial bunch). All these missing data were amounting to quite a bit of money in the sales reports so I had to find a way to identify and transfer them on a regular basis so that this problem was kept at a bare minimum.
The tables that needed to be synced had in excess of 6 mill rows and even though it’s not huge it’s more than enough to cause problems if you write some bad code. SO! With some excellent help from all my geek friends over at http://www.sqlteam.com/forums I was able to come down with a script that did the trick…and mind you; it took less than 30 seconds to execute on the production box identifying over 500 gaps! Now since we are only talking 100k rows here the query will probably have sub-second execution time but it scales very well (the concept behind the query originally created by MVP Steve Kass, http://www.stevekass.com/)
-- Make sure insert runs a little faster SET NOCOUNT ON GO -- Prepare a sample table CREATE TABLE GapsTest ( ID int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED, Name varchar(50) NULL ) GO -- Insert 100k rows of dummy data INSERT INTO GapsTest (Name) VALUES ('TheFirstSQL') GO 100000 -- Delete some rows to create gaps in the identity column DELETE GapsTest WHERE ID IN (263, 264, 37392, 44, 98712, 66219) DELETE GapsTest WHERE ID BETWEEN 35000 AND 36000 GO -- Run the "magic" query ;WITH cte AS ( SELECT ID, RowNum = ROW_NUMBER() OVER (ORDER BY ID) FROM dbo.GapsTest), cte2 AS ( SELECT *, DENSE_RANK() OVER (ORDER BY ID - RowNum) As Series FROM cte), cte3 AS ( SELECT *, COUNT(*) OVER (PARTITION BY Series) AS SCount FROM cte2), cte4 AS ( SELECT MinID = MIN(ID), MaxID = MAX(ID), Series FROM cte3 GROUP BY Series) SELECT GapStart = a.MaxID, GapEnd = b.MinID FROM cte4 a INNER JOIN cte4 b ON a.Series+1 = b.Series ORDER BY GapStart