Wednesday, March 28, 2012

Optimizing datediff

Hey all,

I run a monthly delete that's recurring; delete data > 2 months old. For December, I thought up two ways to do this:

Query 1:
delete from daily_statements where daily_statements < '2006-10-01'

Query 2:
delete from daily_statements where datediff(month, statement_date, getdate()) <= 3

I've plugged in Query 2 into the job step because it can be run at any time without having to change the value, but it took 22 minutes to delete 2.5 million rows versus 4 minutes to delete 5.3 million rows for the other one.

Is query 2 the best it can be? Thanks!

If the statement_date column is indexed, the 2nd query search argument is not efficient since the use of the DATEDIFF function on the column will eliminate the use of the index. The index contains the actual date values so the query optimizer needs search arguments that doesn't change the column in any way. You can do something like below instead:

delete from daily_statements

where statement_date < dateadd(month, -2, convert(varchar, dateadd(day, 1-day(current_timestamp), current_timestamp), 112))

The expression using CURRENT_TIMESTAMP gives the start of the current month and then goes past 2 months. You can then use it to generate the value needed for the statement_date column comparison. In general, expression involving the column directly is more efficient than expressions that call functions or user-defined functions on columns.

|||Use a variable to compare the dates:

DECLARE @.enddate datetime

SET @.enddate = DATEADD(month, -2, GETDATE())
SET @.enddate = CAST(CAST(MONTH(@.enddate) AS VARCHAR(2)) + '/01/' + CAST(YEAR(@.enddate) AS VARCHAR(4)) AS DATETIME)

delete from daily_statements where daily_statements < @.enddate

|||Please don't use the local variable approach. It does not allow the optimizer to do parameter sniffing for example. And since the batch is optimized as a whole an estimate will be used for the variable and it will result in sub-optimal plan. Using the CURRENT_TIMESTAMP expression inline provides the best performance since the query optimizer has built-in rules for such expressions. Also, using a string format for datetime/smalldatetime that is dependent on regional settings or language of the session can result in run-time errors.|||It should treat the local var like a "static" like if I say startdate < '10/1/2006'.

You are correct about the date format. I used the US date, because that was easy.

No comments:

Post a Comment