Wednesday, March 28, 2012

Optimizing cursor

I've tried a bunch of different ways in an effort to stay away from using a cursor, but I haven't been able to accomplish what I need to do without one. So, I coded this process using cursors and performance (as expected) is pretty mediocre. I was wondering if someone could take a quick look and suggest a different approach or maybe suggest ways to optimize the current code.

*Attached* is my code.

TIAYou're already doing LEFT OUTER JOIN on CONTACTS table, so why not avoid additional updates by including dateLastContact and emailformat fields into your SELECT?

Other suggestions may come later, but you can definitely break this thing down by combining the operations and maybe even considering using BCP to generate the actual script that you can run as a scheduled task.|||--I don't know the definition of your group and collection cursors so that may change this process
--It looks like you should be able to do this with set based updates and temp tables. Take a look and
--see if you see any major flaws

---- Save New Contacts in a temp table

select * (enumerate columns) into #new_contacts
FROM ContactImport new_data LEFT OUTER JOIN contacts previous_data ON new_data.emailAddress = previous_data.emailAddress
WHERE contact_id is null

---- Save contacts updates in a temp table

select * into #update_contacts
FROM ContactImport new_data LEFT OUTER JOIN contacts previous_data ON new_data.emailAddress = previous_data.emailAddress
WHERE contact_id is not null

--Update contacts with new contact info if necessary
if @.dup_handling = 'o'

update contacts
set x=x1,y=y1,z=z1 (enumerate columns)
from #update_contacts
where contacts.contact_id = #update_contacts.contact_id

-- Update group contacts for all new contacts, existing contacts missing groupcontact
insert groupContacts (group_id, contact_id)
select group_id, contact_id
from #update_contacts left outer join groupContacts on #update_contacts.contact_id = groupContacts.contact_id
and #update_contacts.group_id = groupContacts.group_id
where groupContacts.contact_id is null
UNION ALL
select group_id, contact_id from #new_contacts

-- Update collections contacts for all new contacts, and existing contacts missing collection contact
insert colCollections (col_id, contact_id)
select coll_id, contact_id
from #update_contacts left outer join colContacts on #update_contacts.contact_id = colContacts.contact_id
and #update_contacts.col_id = colContacts.col_id
where colContacts.contact_id is null
UNION ALL
select col_contacts, contact_id from #new_contacts

insert contacts (x,y,z) (enumerate columns)
select x,y,z from #new_contacts|||vaxman,

Thanks for your ideas; I'll try them shortly. Although, shouldn't I stay away from #temp tables. Maybe do this using a real "temp" table??

To answer your questions:

1. The group cursor is there because a contact can be assigned to multiple groups. In the case of existing contacts I also don't want to add them to the group if they're already in it.

2. The collection cursor works just like the group cursor. Basically, if I'm updating a contact I need to make sure that the contact is not being added multiple times to the same collection (coll_id-contact_id is a unique constraint), and I also have to ensure contacts get added to collections that don't exist (for a non-matching coll_id-contact_id).

Thanks again|||Hi naceBal,

temp tables are not necessarily bad if they solve a greater problem. In this case, you update contacts, groupcontacts and colcontacts from the result set of a query. You can't update multiple tables in a singe statement, so the alternative would be to update the permanent tables by running the main query 3 seperate times. Presumably that will take three times as many resources as running it only once and re-using the results.

I understand WHY the group and col cursors exist, but their exact deffinition was not included so there might be more to them than I assumed.

The set based solution should perform much better. Also, when you declare cursors and specify an order by , group, or distinct, SQL server creates a temporary working table with the resutls anyway, so -- you had temp tables already ;)

Good luck!

No comments:

Post a Comment