Friday, March 30, 2012

Optimizing Store Procedure and Avoiding Cursors....

I'm trying to optimize an ETL batch procedure which takes an extremely long
time to run. I was using cursors for most part of the batch script on
different Store Procedures. Now I'm trying to remove the cursors and replace
them with alternative methods such as using temp tables.
I'm really puzzled as to why the cursor procedure runs much faster than
using temp tables... can anyone point out what is wrong to me?
Below is a snippet of 2 test SPs I've compared against.
TestTable holds 200000 records with 3 columns
1. Using Cursors (This procedure took 4 - 8 secs to run)
declare @.test as numeric
declare @.header as nvarchar(50)
declare @.body as nvarchar(50)
declare @.footer as nvarchar(50)
declare test_cursor cursor for select header, body, footer from testTable
order by header
open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
set @.test = 0
while @.@.fetch_status = 0
begin
set @.test = @.test + 1
fetch next from test_cursor into @.header, @.body, @.footer
end
select @.test
close test_cursor
deallocate test_cursor
2. Using Temp Tables (This procedure took > 1 min)
declare @.test as numeric
declare @.header as nvarchar(50)
declare @.body as nvarchar(50)
declare @.footer as nvarchar(50)
declare @.count int, @.loopcount int
select identity(int,1,1) id, header, body, footer into #temp from testtable
select @.count = @.@.rowcount
set @.loopcount = 1
while @.loopcount <= @.count
begin
select @.header = header, @.body = body, @.footer = footer from
#temp where id = @.loopcount
set @.loopcount = @.loopcount + 1
end
drop table #tempLooking at the code I think it maybe the the where clause in the tempdb,
rember that it has to search through all 200000 records whereas with the
cursor your going through them from start to finish, no searching.
Peter
"Happiness is nothing more than good health and a bad memory."
Albert Schweitzer
"Nestor" wrote:

> I'm trying to optimize an ETL batch procedure which takes an extremely lon
g
> time to run. I was using cursors for most part of the batch script on
> different Store Procedures. Now I'm trying to remove the cursors and repla
ce
> them with alternative methods such as using temp tables.
> I'm really puzzled as to why the cursor procedure runs much faster than
> using temp tables... can anyone point out what is wrong to me?
> Below is a snippet of 2 test SPs I've compared against.
> TestTable holds 200000 records with 3 columns
> 1. Using Cursors (This procedure took 4 - 8 secs to run)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare test_cursor cursor for select header, body, footer from testTable
> order by header
> open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
> set @.test = 0
> while @.@.fetch_status = 0
> begin
> set @.test = @.test + 1
> fetch next from test_cursor into @.header, @.body, @.footer
> end
> select @.test
> close test_cursor
> deallocate test_cursor
>
> 2. Using Temp Tables (This procedure took > 1 min)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare @.count int, @.loopcount int
> select identity(int,1,1) id, header, body, footer into #temp from testtab
le
> select @.count = @.@.rowcount
> set @.loopcount = 1
> while @.loopcount <= @.count
> begin
> select @.header = header, @.body = body, @.footer = footer from
> #temp where id = @.loopcount
> set @.loopcount = @.loopcount + 1
> end
> drop table #temp
>
>
>|||Both 1 and 2 are cursors! Looping through a temp table is just a cursor
in disguise. Unfortunately your sample code doesn't do anything useful
so it's impossible to suggest an alternative set-based solution without
using cursors. Your goal should be to avoid processing one row at a
time by ANY method.
If you need more help please post DDL, sample data INSERTs and show
your required end result.
http://www.aspfaq.com/etiquette.asp?id=5006
David Portas
SQL Server MVP
--|||Hello Nestor,
Going through your code, I dont see any reason why the #temp based approach
should be any better
than the Cursor based approach (if it is supposed to be).
The #temp based approach, Loops anyway with a while Loop which is simillar
to Cursor based method.
Normally, when we say we have to get rid of the cursor, we mean to replace
the Cursor based approach
with a SET BASED APPROACH and your #temp based approach still does record by
record. And as Peter says
for each iteration a select is executed against the table.
Hope this helps.
Gopi
"Nestor" <n3570r@.yahoo.com> wrote in message
news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
> I'm trying to optimize an ETL batch procedure which takes an extremely
> long time to run. I was using cursors for most part of the batch script on
> different Store Procedures. Now I'm trying to remove the cursors and
> replace them with alternative methods such as using temp tables.
> I'm really puzzled as to why the cursor procedure runs much faster than
> using temp tables... can anyone point out what is wrong to me?
> Below is a snippet of 2 test SPs I've compared against.
> TestTable holds 200000 records with 3 columns
> 1. Using Cursors (This procedure took 4 - 8 secs to run)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare test_cursor cursor for select header, body, footer from testTable
> order by header
> open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
> set @.test = 0
> while @.@.fetch_status = 0
> begin
> set @.test = @.test + 1
> fetch next from test_cursor into @.header, @.body, @.footer
> end
> select @.test
> close test_cursor
> deallocate test_cursor
>
> 2. Using Temp Tables (This procedure took > 1 min)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare @.count int, @.loopcount int
> select identity(int,1,1) id, header, body, footer into #temp from
> testtable
> select @.count = @.@.rowcount
> set @.loopcount = 1
> while @.loopcount <= @.count
> begin
> select @.header = header, @.body = body, @.footer = footer from
> #temp where id = @.loopcount
> set @.loopcount = @.loopcount + 1
> end
> drop table #temp
>
>|||Although I haven't tested it, I'd expect the pseudo-cursor would perform
much better if you add a unique clustered index on the #temp id column.
However, as David pointed out, changing a cursor to another iterative
technique isn't the ideal solution and might not be any faster than the
cursor.
We can't make any specific recommendations because the example you posted
does no productive work. If your actual process does something like execute
another proc for each row returned, the set-based solution would be to use
in-line set-based processing instead. This technique is often orders of
magnitude faster than an iterative approach.
Hope this helps.
Dan Guzman
SQL Server MVP
"Nestor" <n3570r@.yahoo.com> wrote in message
news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
> I'm trying to optimize an ETL batch procedure which takes an extremely
> long time to run. I was using cursors for most part of the batch script on
> different Store Procedures. Now I'm trying to remove the cursors and
> replace them with alternative methods such as using temp tables.
> I'm really puzzled as to why the cursor procedure runs much faster than
> using temp tables... can anyone point out what is wrong to me?
> Below is a snippet of 2 test SPs I've compared against.
> TestTable holds 200000 records with 3 columns
> 1. Using Cursors (This procedure took 4 - 8 secs to run)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare test_cursor cursor for select header, body, footer from testTable
> order by header
> open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
> set @.test = 0
> while @.@.fetch_status = 0
> begin
> set @.test = @.test + 1
> fetch next from test_cursor into @.header, @.body, @.footer
> end
> select @.test
> close test_cursor
> deallocate test_cursor
>
> 2. Using Temp Tables (This procedure took > 1 min)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare @.count int, @.loopcount int
> select identity(int,1,1) id, header, body, footer into #temp from
> testtable
> select @.count = @.@.rowcount
> set @.loopcount = 1
> while @.loopcount <= @.count
> begin
> select @.header = header, @.body = body, @.footer = footer from
> #temp where id = @.loopcount
> set @.loopcount = @.loopcount + 1
> end
> drop table #temp
>
>|||Dan,
Could you please share with me what "in-line set-based " means.
I was looking at a Stored Proc which has a nested cursors (One cursor inside
another cursor)
and the While loop of the second a SP is called with the two parameters
(each of them coming
from each of the Cursor code.
Here is the PSeudo-Code :
Declare Cursor Cursor1 Select * from table1
Open Cursor1
Fetch into @.var1
While @.@.FETCH_STATUS <> 1
Declare Cursor Cursor2 Select * from table2
Open Cursor2
Fetch into @.var2
While @.@.FETCH_STATUS <> 1
Exec Spname @.var1, @.var2
........
.....
DEALLOCATE Cursor1
DEALLOCATE Cursor2
I have noticed that the two cursors can be combined into one using
appropriate Joins.
However, I was under the impression I will still have to use a Cursor for
executing the SP
Could you please point me in the right direction if there is a way out.
Thanks,
Gopi
<< If your actual process does something like execute another proc for each
row returned, the set-based
solution would be to use in-line set-based processing instead>>
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:OXOup8KJFHA.4092@.tk2msftngp13.phx.gbl...
> Although I haven't tested it, I'd expect the pseudo-cursor would perform
> much better if you add a unique clustered index on the #temp id column.
> However, as David pointed out, changing a cursor to another iterative
> technique isn't the ideal solution and might not be any faster than the
> cursor.
> We can't make any specific recommendations because the example you posted
> does no productive work. If your actual process does something like
> execute another proc for each row returned, the set-based solution would
> be to use in-line set-based processing instead. This technique is often
> orders of magnitude faster than an iterative approach.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Nestor" <n3570r@.yahoo.com> wrote in message
> news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
>|||> However, I was under the impression I will still have to use a Cursor
for
> executing the SP
Yes, using an SP in this way forces you to use the iterative row-by-row
approach. The set-based solution is to perform whatever logic is
contained in SPname using a query against your tables so that the
entire operation is performed against the data set rather than one row
at a time. For data manipulation operations that usually has advantages
over the cursor in terms of performance, scalability and
maintainability.
David Portas
SQL Server MVP
--|||> Here is the PSeudo-Code :
Your pseudo-code effectively calls the proc for each row returned by the
Cartesian product of table1 and table2. Assuming that the executed proc
simply inserts into a third table, a working code example:
CREATE TABLE table1
(
Col1 int NOT NULL PRIMARY KEY
)
INSERT INTO table1 VALUES(1)
INSERT INTO table1 VALUES(2)
INSERT INTO table1 VALUES(3)
CREATE TABLE table2
(
Col1 int NOT NULL PRIMARY KEY
)
INSERT INTO table2 VALUES(4)
INSERT INTO table2 VALUES(5)
INSERT INTO table2 VALUES(6)
CREATE TABLE table3
(
Col1 int NOT NULL,
Col2 int NOT NULL,
PRIMARY KEY (Col1, Col2)
)
GO
CREATE PROC Spname
@.var1 int,
@.var2 int
AS
SET NOCOUNT ON
INSERT INTO Table3(Col1, Col2)
VALUES(@.var1, @.var2)
GO
DECLARE @.var1 int,
@.var2 int
DECLARE Cursor1 CURSOR
LOCAL FAST_FORWARD READ_ONLY FOR
SELECT * FROM table1
OPEN Cursor1
FETCH NEXT FROM Cursor1 INTO @.var1
WHILE @.@.FETCH_STATUS <> -1
BEGIN
DECLARE Cursor2 CURSOR LOCAL FOR SELECT * FROM table2
OPEN Cursor2
FETCH NEXT FROM Cursor2 INTO @.var2
WHILE @.@.FETCH_STATUS <> -1
BEGIN
EXEC Spname @.var1, @.var2
FETCH NEXT FROM Cursor2 INTO @.var2
END
CLOSE Cursor2
DEALLOCATE Cursor2
FETCH NEXT FROM Cursor1 INTO @.var1
END
CLOSE Cursor1
DEALLOCATE Cursor1
GO

> I have noticed that the two cursors can be combined into one using
> appropriate Joins.
Exactly. That's the first step.

> However, I was under the impression I will still have to use a Cursor for
> executing the SP
Yes you will. This is why I suggested including the equivalent proc code
inline *instead of* executing the proc for each row. The entire cursor
script and proc code in my functional example could be replaced with a
single insert statement:
INSERT INTO Table3(Col1, Col2)
SELECT t1.Col1, t2.Col1
FROM table1 t1
CROSS JOIN table2 t2
Of course, a lot depends on exactly what the called proc does. Also, your
actual code probably isn't a CROSS JOIN. A simple INNER JOIN example:
INSERT INTO Table3(Col1, Col2)
SELECT t1.Col1, t2.Col1
FROM table1 t1
JOIN table2 t2 ON
t1.Col1 = t2.SomeColumn
If you need more help, you need to provide details on what the proc actually
does. Actual code will help.
Hope this helps.
Dan Guzman
SQL Server MVP
"rgn" <gopinathr@.healthasyst.com> wrote in message
news:%23Sx92CLJFHA.1948@.TK2MSFTNGP14.phx.gbl...
> Dan,
> Could you please share with me what "in-line set-based " means.
> I was looking at a Stored Proc which has a nested cursors (One cursor
> inside another cursor)
> and the While loop of the second a SP is called with the two parameters
> (each of them coming
> from each of the Cursor code.
> Here is the PSeudo-Code :
> Declare Cursor Cursor1 Select * from table1
> Open Cursor1
> Fetch into @.var1
> While @.@.FETCH_STATUS <> 1
>
> Declare Cursor Cursor2 Select * from table2
> Open Cursor2
> Fetch into @.var2
> While @.@.FETCH_STATUS <> 1
> Exec Spname @.var1, @.var2
> ........
> .....
> DEALLOCATE Cursor1
> DEALLOCATE Cursor2
> I have noticed that the two cursors can be combined into one using
> appropriate Joins.
> However, I was under the impression I will still have to use a Cursor for
> executing the SP
> Could you please point me in the right direction if there is a way out.
> Thanks,
> Gopi
> << If your actual process does something like execute another proc for
> each row returned, the set-based
> solution would be to use in-line set-based processing instead>>
>
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:OXOup8KJFHA.4092@.tk2msftngp13.phx.gbl...
>|||Thanks for pointing that out. I've got the code base from an article in
http://www.extremeexperts.com/SQL/A...TSQLResult.aspx
Am I doing it wrong with instructions from that article? In any case, what
exactly is SET BASED APPROACH? I do realise and know that SQL Server is
meant to be relational and not sequential, but I need to understand more on
how to apply the alternative first before trying to overhaul the scripts
running in the production servers.
Can anyone give me an equivalent of the cursor procedure I posted using SET
BASE APPROACH so that I can better relate?
Thanks.
Regards,
Nestor
"rgn" <gopinathr@.healthasyst.com> wrote in message
news:e$w%23rZJJFHA.2604@.TK2MSFTNGP15.phx.gbl...
> Hello Nestor,
> Going through your code, I dont see any reason why the #temp based
> approach should be any better
> than the Cursor based approach (if it is supposed to be).
> The #temp based approach, Loops anyway with a while Loop which is simillar
> to Cursor based method.
> Normally, when we say we have to get rid of the cursor, we mean to replace
> the Cursor based approach
> with a SET BASED APPROACH and your #temp based approach still does record
> by record. And as Peter says
> for each iteration a select is executed against the table.
> Hope this helps.
> Gopi
>
> "Nestor" <n3570r@.yahoo.com> wrote in message
> news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
>|||The article demonstrates some techniques for iteration but doesn't give
any recommendations as to when to use them. 99.99% of the time
row-by-row processing isn't necessary so those techniques shouldn't be
required. The Set Based approach means that you use SQL DML statements
(basically SELECT, UPDATE, INSERT, DELETE) that operate on a SET of
rows at a time rather than ONE row at a time. That should always be the
standard, preferred way to write code.

> Can anyone give me an equivalent of the cursor procedure I posted
using SET
> BASE APPROACH so that I can better relate?
But your cursor didn't do anything except assign a few variables. How
can we show you how to replace it if we don't know what it's supposed
to do? What we need to know is what you are actually trying to achieve
then we can suggest the right solution. It may even be that there isn't
a set-based solution to your problem - that's rarely the case but we
can't tell you until we understand the problem.
The following article explains the best way to post your problem here:
http://www.aspfaq.com/etiquette.asp?id=5006
David Portas
SQL Server MVP
--

No comments:

Post a Comment