How to find gaps in identity columns at the speed of light

Posted: 2011.07.08 in 2008, Scripts, SQL Server 2008

I was just recently working with a client that was moving a few 100k rows of data on a daily basis from one system to another and all this data was incremental in the sense that the source data was a ticket system and there was an incrementing ID on all the sales records. Now the problem was that for various reasons not all data had been transferred in the past, a few sales records had been deleted because of void transactions (these were not so many), others were simply missing (this was a substantial bunch). All these missing data  were amounting to quite a bit of money in the sales reports so I had to find a way to identify and transfer them on a regular basis so that this problem was kept at a bare minimum.

The tables that needed to be synced had in excess of 6 mill rows and even though it’s not huge it’s more than enough to cause problems if you write some bad code. SO! With some excellent help from all my geek friends over at http://www.sqlteam.com/forums  I was able to come down with a script that did the trick…and mind you; it took less than 30 seconds to execute on the production box identifying over 500 gaps! Now since we are only talking 100k rows here the query will probably have sub-second execution time but it scales very well (the concept behind the query originally created by MVP Steve Kass, http://www.stevekass.com/)

-- Make sure insert runs a little faster
SET NOCOUNT ON
GO

-- Prepare a sample table
CREATE TABLE GapsTest (
	ID int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
	Name varchar(50) NULL
	)
GO

-- Insert 100k rows of dummy data
INSERT INTO GapsTest (Name) VALUES ('TheFirstSQL')
GO 100000

-- Delete some rows to create gaps in the identity column
DELETE GapsTest WHERE ID IN (263, 264, 37392, 44, 98712, 66219)
DELETE GapsTest WHERE ID BETWEEN 35000 AND 36000
GO

-- Run the "magic" query
;WITH 
cte AS (
	SELECT 
		ID, 
		RowNum = ROW_NUMBER() OVER (ORDER BY ID) 
	FROM dbo.GapsTest),
cte2 AS (
	SELECT *, DENSE_RANK() OVER (ORDER BY ID - RowNum) As Series 
	FROM cte),
cte3 AS (
	SELECT *, COUNT(*) OVER (PARTITION BY Series) AS SCount 
	FROM cte2),
cte4 AS (
	SELECT 
		MinID = MIN(ID), 
		MaxID = MAX(ID), 
		Series
	FROM cte3
	GROUP BY Series)
SELECT GapStart = a.MaxID, GapEnd = b.MinID
FROM cte4 a
	INNER JOIN cte4 b
		ON a.Series+1 = b.Series
ORDER BY GapStart
About these ads
Comments
  1. Thanks for this interesting post. It was informative. Please check the alternate methods I have blogged relating to this @ http://vadivel.blogspot.com/2011/10/how-to-find-missing-gaps-sequence.html

  2. SwePeso says:

    ;WITH cteSource(ID, Lead)
    AS (
    SELECT ID,
    LEAD(ID) OVER (ORDER BY ID) AS Lead
    FROM dbo.GapsTest
    )
    SELECT ID,
    Lead
    FROM cteSource
    WHERE ID + 1 < Lead

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s