Friday, March 23, 2012

Optimize function that uses cursors

Hello,
I have a question regarding ways to optimize a function that currently uses
cursors to get some data from an SQL Database table.
I'm looking for others' opinions as we have discussed this internally in our
team but have reached to no viable conclusion yet. We are mostly interested
in ways to use SELECT INTO, UNION, or other constructs that we may have
missed instead of the cursors (but to be able to run them within a function:
SELECT INTO statement isn't allowed within a user defined function).
The function can be made recursive as there are no much recursions (we
expect no more than 10 levels of parent groups), but we have implemented it
using a WHILE look instead of recursion also for performance reasons.
Thank you in advance for your time. Any comments/suggestions will be highly
appreciated.
Basically we have a table called GroupItems with schema and data like below
(each item may have none, one, or more parent groups; the groups are also
considered items):
IDItem, IDGroup
--
3, 1
4, 2
5, 3
5, 4
6, 2
We also have defined a recursive user function that retreives all the parent
groups and ancestor groups (i.e. the parents of the parents and so on) for a
given ID (including that ID itself). The function returns a table data type
and uses a cursor to select next level of parents and their ancestors (using
a recursive call in the select of the cursor), and in the cursor look it use
s
insert into statements to add the data to the table.
For the sample data above, and using @.idItem = 5 as the parameter, the
function would return:
ID
--
1
2
3
4
5
As a picture is like a thousand words, the code for the function is like thi
s:
CREATE function dbo.GetParentItems(@.idItem int)
returns @.t table([ID] int primary key)
begin
-- insert current ID
insert into @.t ([ID]) values (@.idItem)
declare @.cnt int
select @.cnt = count(*) from @.t
declare @.more bit
set @.more = 1
-- the @.more variable will tell us when to stop
while @.more = 1
begin
-- get the direct parents of the current
items from the current @.t variable, except those parents which are already
added to the result; we store the new IDs in a temporary @.v variable
declare @.v table ([ID] int primary key)
declare @.id int
delete from @.v
declare c cursor for
select [GroupItems].[IDGroup]
from [GroupItems]
where
[GroupItems].[IDItem] in (select [ID] from @.t) and
[GroupItems].[IDGroup] not in (select [ID] from @.t)
open c
fetch next from c into @.id
while @.@.fetch_status = 0
begin
insert into @.v ([ID]) values (@.id)
fetch next from c into @.id
end
close c
deallocate c
-- now add the new IDs from @.v to the result
@.t
declare cv cursor for
select [ID] from @.v
open cv
fetch next from cv into @.id
while @.@.fetch_status = 0
begin
insert into @.t ([ID]) values (@.id)
fetch next from cv into @.id
end
close cv
deallocate cv
declare @.cntCur int
select @.cntCur = count(*) from @.t
-- if we have added no new IDs, the counts
will be the same and we stop looping
if @.cnt = @.cntCur
set @.more = 0
set @.cnt = @.cntCur
end
return
end
Thank you very much again.
Sorin DolhaSorin
Look at this example written by Itzik Ben-Gan
IF object_id('dbo.Employees') IS NOT NULL
DROP TABLE Employees
GO
IF object_id('dbo.ufn_GetSubtree') IS NOT NULL
DROP FUNCTION dbo.ufn_GetSubtree
GO
CREATE TABLE Employees
(
empid int NOT NULL,
mgrid int NULL,
empname varchar(25) NOT NULL,
salary money NOT NULL,
CONSTRAINT PK_Employees_empid PRIMARY KEY(empid),
CONSTRAINT FK_Employees_mgrid_empid
FOREIGN KEY(mgrid)
REFERENCES Employees(empid)
)
CREATE INDEX idx_nci_mgrid ON Employees(mgrid)
INSERT INTO Employees VALUES(1 , NULL, 'Nancy' , $10000.00)
INSERT INTO Employees VALUES(2 , 1 , 'Andrew' , $5000.00)
INSERT INTO Employees VALUES(3 , 1 , 'Janet' , $5000.00)
INSERT INTO Employees VALUES(4 , 1 , 'Margaret', $5000.00)
INSERT INTO Employees VALUES(5 , 2 , 'Steven' , $2500.00)
INSERT INTO Employees VALUES(6 , 2 , 'Michael' , $2500.00)
INSERT INTO Employees VALUES(7 , 3 , 'Robert' , $2500.00)
INSERT INTO Employees VALUES(8 , 3 , 'Laura' , $2500.00)
INSERT INTO Employees VALUES(9 , 3 , 'Ann' , $2500.00)
INSERT INTO Employees VALUES(10, 4 , 'Ina' , $2500.00)
INSERT INTO Employees VALUES(11, 7 , 'David' , $2000.00)
INSERT INTO Employees VALUES(12, 7 , 'Ron' , $2000.00)
INSERT INTO Employees VALUES(13, 7 , 'Dan' , $2000.00)
INSERT INTO Employees VALUES(14, 11 , 'James' , $1500.00)
GO
CREATE FUNCTION dbo.ufn_GetSubtree
(
@.mgrid AS int
)
RETURNS @.tree table
(
empid int NOT NULL,
mgrid int NULL,
empname varchar(25) NOT NULL,
salary money NOT NULL,
lvl int NOT NULL,
path varchar(900) NOT NULL
)
AS
BEGIN
DECLARE @.lvl AS int, @.path AS varchar(900)
SELECT @.lvl = 0, @.path = '.'
INSERT INTO @.tree
SELECT empid, mgrid, empname, salary,
@.lvl, '.' + CAST(empid AS varchar(10)) + '.'
FROM Employees
WHERE empid = @.mgrid
WHILE @.@.ROWCOUNT > 0
BEGIN
SET @.lvl = @.lvl + 1
INSERT INTO @.tree
SELECT E.empid, E.mgrid, E.empname, E.salary,
@.lvl, T.path + CAST(E.empid AS varchar(10)) + '.'
FROM Employees AS E JOIN @.tree AS T
ON E.mgrid = T.empid AND T.lvl = @.lvl - 1
END
RETURN
END
GO
SELECT empid, mgrid, empname, salary
FROM ufn_GetSubtree(3)
GO
"Sorin Dolha" <SorinDolha@.discussions.microsoft.com> wrote in message
news:A70E5266-647F-4D49-A162-D608C0A5ABB3@.microsoft.com...
> Hello,
> I have a question regarding ways to optimize a function that currently
uses
> cursors to get some data from an SQL Database table.
> I'm looking for others' opinions as we have discussed this internally in
our
> team but have reached to no viable conclusion yet. We are mostly
interested
> in ways to use SELECT INTO, UNION, or other constructs that we may have
> missed instead of the cursors (but to be able to run them within a
function:
> SELECT INTO statement isn't allowed within a user defined function).
> The function can be made recursive as there are no much recursions (we
> expect no more than 10 levels of parent groups), but we have implemented
it
> using a WHILE look instead of recursion also for performance reasons.
> Thank you in advance for your time. Any comments/suggestions will be
highly
> appreciated.
> Basically we have a table called GroupItems with schema and data like
below
> (each item may have none, one, or more parent groups; the groups are also
> considered items):
> IDItem, IDGroup
> --
> 3, 1
> 4, 2
> 5, 3
> 5, 4
> 6, 2
> We also have defined a recursive user function that retreives all the
parent
> groups and ancestor groups (i.e. the parents of the parents and so on) for
a
> given ID (including that ID itself). The function returns a table data
type
> and uses a cursor to select next level of parents and their ancestors
(using
> a recursive call in the select of the cursor), and in the cursor look it
uses
> insert into statements to add the data to the table.
> For the sample data above, and using @.idItem = 5 as the parameter, the
> function would return:
> ID
> --
> 1
> 2
> 3
> 4
> 5
> As a picture is like a thousand words, the code for the function is like
this:
> CREATE function dbo.GetParentItems(@.idItem int)
> returns @.t table([ID] int primary key)
> begin
> -- insert current ID
> insert into @.t ([ID]) values (@.idItem)
> declare @.cnt int
> select @.cnt = count(*) from @.t
> declare @.more bit
> set @.more = 1
> -- the @.more variable will tell us when to stop
> while @.more = 1
> begin
> -- get the direct parents of the current
> items from the current @.t variable, except those parents which are already
> added to the result; we store the new IDs in a temporary @.v variable
> declare @.v table ([ID] int primary key)
> declare @.id int
> delete from @.v
> declare c cursor for
> select [GroupItems].[IDGroup]
> from [GroupItems]
> where
> [GroupItems].[IDItem] in (select [ID] from @.t) and
> [GroupItems].[IDGroup] not in (select [ID] from @.t)
> open c
> fetch next from c into @.id
> while @.@.fetch_status = 0
> begin
> insert into @.v ([ID]) values (@.id)
> fetch next from c into @.id
> end
> close c
> deallocate c
> -- now add the new IDs from @.v to the
result
> @.t
> declare cv cursor for
> select [ID] from @.v
> open cv
> fetch next from cv into @.id
> while @.@.fetch_status = 0
> begin
> insert into @.t ([ID]) values (@.id)
> fetch next from cv into @.id
> end
> close cv
> deallocate cv
> declare @.cntCur int
> select @.cntCur = count(*) from @.t
> -- if we have added no new IDs, the counts
> will be the same and we stop looping
> if @.cnt = @.cntCur
> set @.more = 0
> set @.cnt = @.cntCur
> end
> return
> end
> Thank you very much again.
> --
> Sorin Dolha|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Sample data is also a good idea, along with clear
specifications.
Look up nested sets model for trees. Get a copy of TREES & HIERARCHIES
IN SQL. There is no need for recursion or any procedural code from the
vague narrative you posted instead of specs.|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Sample data is also a good idea, along with clear
specifications.
Look up nested sets model for trees. Get a copy of TREES & HIERARCHIES
IN SQL. There is no need for recursion or any procedural code from the
vague narrative you posted instead of specs.

No comments:

Post a Comment