OpenXML Identity Insert: optimization challenge

Tuesday, March 20, 2012

optimization challenge

Well i wanted to prove to some guys that cursors are not really that important:shocked: .
:D So this code is suppose to remove duplicate tuples from a table without temporary tables or cursors:D. Except it needs some optimization(and alot of system down time, not sure about that:confused: ).
I would like it, if some one could find an instance of the table when the below code fails or some way to optimize the code or anything;) .

--trashtable for real data
create table abc
(col1 tinyint,
col2 tinyint,
col3 tinyint)

--trash values for trash table
insert into abc values (1,1,1)
insert into abc values (1,1,1)
insert into abc values (1,1,1)
insert into abc values (1,1,1)
insert into abc values (2,2,2)
insert into abc values (2,2,2)
insert into abc values (2,2,2)
insert into abc values (3,2,1)
insert into abc values (2,2,3)
insert into abc values (3,2,4)

--check that there are ten rows
select * from abc
--check that there are only five distinct rows
select distinct * from abc

--run code : next 15 line as a batch
declare @.lp tinyint
declare @.col1 tinyint,@.col2 tinyint,@.col3 tinyint
set @.lp=1
while @.lp>0
begin
if not exists (select top 1 * from abc group by col1,col2,col3 having count(col1)>1)
set @.lp=0
else
begin
select top 1 @.col1 = col1,@.col2 = col2,@.col3 = col3 from abc group by col1,col2,col3 having count(col1)>1
delete from abc where col1=@.col1 and col2=@.col2 and col3=@.col3
insert into abc values(@.col1,@.col2,@.col3)
end
end

--only distinct values left in trash table
select * from abc

--think code can be optimized
--just wanted to prove: can be done without cursors or temporary tablesI know this is a cheat and I'm not exactly rising to the challenge however there is a pretty good discussion about (and links to) removing dupes here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=6256

HTH|||Thanks:cool:

Had a look at the URL.
another cool way is to use SET ROWCOUNT with DELETE.|||While I realize that this uses a temp table, the usage is quite small and this is pretty efficient:
--trashtable for real data
create table abc
( col1 tinyint
, col2 tinyint
, col3 tinyint)

--trash values for trash table
insert into abc values (1,1,1)
insert into abc values (1,1,1)
insert into abc values (1,1,1)
insert into abc values (1,1,1)
insert into abc values (2,2,2)
insert into abc values (2,2,2)
insert into abc values (2,2,2)
insert into abc values (3,2,1)
insert into abc values (2,2,3)
insert into abc values (3,2,4)

--check that there are ten rows
select * from abc

--check that there are only five distinct rows
select distinct * from abc

create table ptp_dupes
( col1 tinyint
, col2 tinyint
, col3 tinyint)

INSERT INTO ptp_dupes (col1, col2, col3)
SELECT col1, col2, col3
FROM abc
GROUP BY col1, col2, col3
HAVING 1 < Count(*)

BEGIN TRANSACTION

DELETE FROM abc
WHERE EXISTS (SELECT *
FROM ptp_dupes
WHERE ptp_dupes.col1 = abc.col1
AND ptp_dupes.col2 = abc.col2
AND ptp_dupes.col3 = abc.col3)

INSERT INTO abc (col1, col2, col3)
SELECT col1, col2, col3
FROM ptp_dupes

COMMIT TRANSACTION

SELECT col1, col2, col3 FROM abc ORDER BY col1, col2, col3

SELECT DISTINCT col1, col2, col3 FROM abc ORDER BY col1, col2, col3Unfortunately, there isn't anything I can recommend as even close to efficient that doesn't use a temp table at all. The cursor solutions are inefficient, and the code that you've shown is interesting, but not very efficient.

-PatP

Tuesday, March 20, 2012

optimization challenge

No comments:

Post a Comment

OpenXML Identity Insert

Blog Archive

About Me