Monday, March 26, 2012

Optimize this query?

I have a nightly job that aggregates data into a summary table. Each night, an additional item may or may not be added, so the table is cleared and repopulated. I've got this working fine. Now, I am being asked to track when an item first appears in the summary table. Here is the code I am using to do this. I realize it is very poor performance, but I don't know how to make it better.

MyTable Columns:
id INT PK
,v1 VARCHAR(4)
,v2 VARCHAR(10
,v3 DECIMAL
,item1 BIT
,item2 BIT
,item3 BIT

MyTable2 Columns:
v1 VARCHAR(4)
,v2 VARCHAR(10
,v3 DECIMAL
,item INT
,FirstKnownDate DATETIME

All columns are NOT NULL.

1DECLARE @.iNextRowIDINT2 ,@.iCurrentRowIDINT3 ,@.iLoopControlINT4 ,@.v1VARCHAR(4)5 ,@.v2VARCHAR(10)6 ,@.v3DECIMAL78/* This is a loop that executes for a dynamic number of items, eg item1=1, then item2=1, etc */9SET @.iLoopControl=110SELECT @.iNextRowID=MIN(id)FROM MyTableWHERE item1=11112IFISNULL(@.iNextRowID,0) <> 0BEGIN13 SELECT @.iCurrentRowID = id14 ,@.v1 = v115 ,@.v2 = v216 ,@.v3 = v317FROM MyTable18WHERE id=@.iNextRowID1920WHILE @.iLoopControl = 1BEGIN21 IFNOT EXISTS(SELECT *FROM MyTable2WHERE v1=@.v2AND v2=@.v2AND v3=@.v3AND item=1)BEGIN22 INSERT INTO MyTable223/* COLUMN */ (v1,v2,v3,item,firstknowndate)24SELECT @.v1, @.v2, @.v3, 1,GETDATE()25END2627 SELECT @.iNextRowID =NULL28SELECT @.iNextRowID =MIN(id)FROM MyTableWHERE id>@.iCurrentRowIDAND item1=12930IFISNULL(@.iNextRowID,0) = 0BEGIN31 BREAK32 END3334 SELECT @.iCurrentRowID = id35 ,@.v1 = v136 ,@.v2 = v237 ,@.v3 = v338FROM MyTable39WHERE id=@.iNextRowID40END41END
42/* This is the end of the items loop */
This query takes 5 minutes to do straight inserts without the test for exists in the loop for 3 items.  MyTable has just under 96,000 rows in it. MyTable2 has 121,857 rows in it. It is not about 55 minutes and it is still running with the exists so that a row isn't inserted twice. Is there anything I can do to optimize this?
 EDIT: The query just finished running again with the test for exists on each row. It shouldn't have inserted any new rows, but it took 57 minutes and inserted another 114,115 rows. Not good. Anyone see what is wrong with the query?

Have a look at the execution plan and see what costs are associated with it.

|||

This should be obvious just by looking at the query. There are so many select statements, at least 125,000 exists statements. Obviously it won't finish in a few seconds. I just mean, is there a better way to loop through each row and insert or do nothing based on whether that row exists in the second table?

|||

I just meant that it would show you exactly where the bottlenecks exist. As for improving the query, you could probably just join to the second table and perform inserts for all the missing rows.

|||

Code sample? Eliminating the exists is really what is needed.

No comments:

Post a Comment