Ok, so I have some horribly convuluted SQL that I would love to optomize. I'm not happy leaving it in it's current state, that's for sure!
I'm currently working on our test bed servers, so obviously my stats are out because of the "crap-ness" (yes, that's the technical term) of the hardware, but still, it should NEVER need to take this long!!
Basically, the issue arises in the nasty join to the career table (one employee can have multiple career lines). Just to make things complicated, employees can have any number of career records on any given date, these can even be input for future career events. The following SQL picks out the latest-current career date for each employee based on the career_date being <= GetDate() and the date of entry for this date being the greatest.
E.g.
career_date | datetime_created
2009-01-01 | 2006-05-05 13:55:21.000
2007-01-01 | 2006-05-05 13:54:18.000
2007-01-01 | 2006-05-05 13:52:55.000
From the above we want to return
2007-01-01 | 2006-05-05 13:54:18.000
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT a.sAMAccountName As 'sAMAccountName'
, a.userPrincipalName As 'userPrincipalName'
, 'TRUE' As 'Modify'
, RTRIM(e.unique_identifier) As 'employeeID'
, RTRIM(e.employee_number) As 'employeeNumber'
, RTRIM(e.known_as)
+ CASE WHEN RTRIM(e.surname) IS NOT NULL THEN
' ' + RTRIM(e.surname) ELSE NULL END As 'displayName'
, RTRIM(e.known_as) As 'givenName'
, RTRIM(e.surname) As 'sn'
, RTRIM(c.job_title) As 'title'
, RTRIM(c.division) As 'company'
, RTRIM(c.department) As 'department'
, RTRIM(l.description) As 'physicalDeliveryOfficeName'
, RTRIM(REPLACE(am.dn,'\\','\')) As 'manager'
, t.full_mobile
+ CASE WHEN RTRIM(t.mobile_number) IS NOT NULL THEN
' (DD: ' + RTRIM(t.mobile_number) + ')' ELSE NULL END
As 'mobile'
, t.mobile_number As 'otherMobile'
, ad.address_ad_country As 'c'
, ad.address_ad_address1
+ CASE WHEN ad.address_ad_address2 IS NOT NULL THEN
', ' + ad.address_ad_address2 ELSE NULL END
+ CASE WHEN ad.address_ad_address3 IS NOT NULL THEN
', ' + ad.address_ad_address3 ELSE NULL END
+ CASE WHEN ad.address_ad_address4 IS NOT NULL THEN
', ' + ad.address_ad_address4 ELSE NULL END
+ CASE WHEN ad.address_ad_address5 IS NOT NULL THEN
', ' + ad.address_ad_address5 ELSE NULL END As 'streetAddress'
, ad.address_ad_pobox As 'postOfficeBox'
, ad.address_ad_city As 'l'
, ad.address_ad_County As 'st'
, ad.address_ad_postcode As 'postalCode'
, RTRIM(ad.address_ad_telephone) +
CASE WHEN RTRIM(a.othertelephone) IS NOT NULL
AND RTRIM(ad.address_ad_telephone) IS NOT NULL THEN
' (Ext: ' + RTRIM(a.othertelephone) + ')'
ELSE
CASE WHEN RTRIM(a.othertelephone) IS NOT NULL
AND RTRIM(ad.address_ad_telephone) IS NULL THEN
'Ext: ' + RTRIM(a.othertelephone)
ELSE NULL
END
END As 'telephoneNumber'
FROM employee e
LEFT
JOIN career c
ON c.parent_identifier = e.unique_identifier
AND c.career_date =(
SELECT max(c2.career_date)
FROM pwa_master.career c2
WHERE c2.parent_identifier = c.parent_identifier
AND c2.career_date <= GetDate()
)
AND c.datetime_created =(
SELECT max(c3.datetime_created)
FROM pwa_master.career c3
WHERE c3.parent_identifier = c.parent_identifier
AND c3.career_date = c.career_date
)
LEFT
OUTER
JOIN AD_Import am
ON am.employeeNumber = c.manager_number
INNER
JOIN AD_Import a
ON a.employeeID = e.unique_identifier
LEFT
JOIN AD_Telephone t
ON t.unique_identifier = e.unique_identifier
LEFT
JOIN AD_Address ad
ON ad.address_pwa_location = e.location
LEFT
JOIN xlocat l
ON l.code = c.location
WHERE (a.employeeNumber IS NOT NULL
OR a.employeeID IS NOT NULL)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(1706 row(s) affected)
Table 'AD_Import'. Scan count 4, logical reads 106, physical reads 0, read-ahead reads 0.
Table 'AD_Address'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
Table 'AD_Telephone'. Scan count 2, logical reads 10, physical reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 868, logical reads 956, physical reads 0, read-ahead reads 0.
Table 'xlocat'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0.
Table 'career'. Scan count 5088, logical reads 2564843, physical reads 0, read-ahead reads 0.
Table 'people'. Scan count 1697, logical reads 5253, physical reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 826, logical reads 914, physical reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15203 ms, elapsed time = 8114 ms.
Any advice on what I can do to optomize?
Oh judt to point out that "employee" is a view on the "Table 'people'."
EDIT: I know it's pointing out the obvious, but I'm pulling out the managers "DN" from AD_Import based on the manager_number and employeeNumber matching.Use a derived table rather thsan corrolate on two columns (that means double the scans...)
--.........
FROM employee e
LEFT OUTER
JOIN--"last" career record per person
(SELECT--Various columns from career
FROM career
INNER JOIN
(SELECT parent_identifier
,max(career_date)
,max(datetime_created)
FROM pwa_master.career
WHERE career_date <=GetDate())AS last_career_record
ON last_career_record.parent_identifier = career.parent_identifier)AS last_career_record
ON last_career_record.parent_identifier = e.unique_identifier
--.........
Formatting has not transferred as usual.|||Also it is spelled optimise. I plan to write wrappers for all my .NET objects to correct the spelling of words like colour.|||Also it is spelled optimise. I plan to write wrappers for all my .NET objects to correct the spelling of words like colour.
Apologies - I was just spelling it as I says it (which is no excuse ;))!
I'll go have a toy with your suggested code on the test bed now and let you know how I get on later.
Appreciate it Poots, thanks!|||Also -
WHERE (a.employeeNumber IS NOT NULL
OR a.employeeID IS NOT NULL)can be
WHERE a.employeeID IS NOT NULLI doubt the engine would get caught out but the inner join will sort out null employee numbers. You could try both types and see if the stats differ. I would guess they won't.|||That's one freaky JOIN...is it even a theta join?
Can you explain the business reason for it?|||That's one freaky JOIN...is it even a theta join?
Can you explain the business reason for it?Nope - cause it is not an inequality join.
Am I missing something? It is simply getting the "last" record per person from career.|||The current record per person from career.
So the top record whose date is <= GetDate().
But there is more criteria, because a person can have more than one record for that day, you have to pick up the top one of those based on the datetime_created.
Confusing as hell, eh?
I told my manager that I had the thing working but I was not happy to put it on the production box (I don't care if it only runs 3 times a week) because it was so, well, crap!
He responded with "How long does it take? ONLY 15 seconds!? Ha, leave it George and go work on the next bit!"
:mad:
This is the guy that told me that I have what it takes to become a DBA...|||As an old manager once told me "good enough is good enough". A culminative run time of 45 seconds per week is nothing and I think I would agree with him. Annoying if you are a perfectionist of course but I do not bear that particular burdon :)|||The current record per person from career.
So the top record whose date is <= GetDate().
But there is more criteria, because a person can have more than one record for that day, you have to pick up the top one of those based on the datetime_created.
Confusing as hell, eh?Actually that is very like the stuff I work with most of my day. We're rewriting legacy procedural code acting on highly temporal data in hierarchical databases. As such "get the 'last' this" and "return the 'first' that" is very second nature to me ATM.|||As an old manager once told me "good enough is good enough". A culminative run time of 45 seconds per week is nothing and I think I would agree with him. Annoying if you are a perfectionist of course but I do not bear that particular burdon :)
is it spelled "burdon" in the UK? over here we spell it burden. or maybe you meant "burbon"
:)|||Is "burbon" anything like Bourbon?|||Bourbon in the UK is a biscuit ;)
Alas, I do have a perfectionist streak (it was a "weakness" I named in my original interview - it went down well!)... I'm not happy leaving it with such a "long" running time, but he is right - it works, produces the right results and only has to run 3 times a week overnight...
Thanks for the help again Poots.|||Using SQL Server 2005? Then make use of ROW_NUMBER() function. It is more efficient than "derived table with max" thingy.|||I'm using 6.5 ;)
Not even a TOP for me!|||I'm using 6.5 ;)
Not even a TOP for me!You so need to sort that out. I know people on other boards that are mocked for using 7.0.
I didn't know the OVER() clause was better than MAX() but I do know Peter so I will believe it.
And no it is spelt burden here too :rolleyes:|||On large datasets windowed functions are faster and more efficient.
On smaller datasets they are sometimes equal in speed and sometimes the MAX thingy is faster for small datasets!|||SELECT a.sAMAccountName As 'sAMAccountName',
a.userPrincipalName As 'userPrincipalName',
'TRUE' As 'Modify',
RTRIM(e.unique_identifier) As 'employeeID',
RTRIM(e.employee_number) As 'employeeNumber',
RTRIM(e.known_as)
+ CASE
WHEN RTRIM(e.surname) IS NULL THEN ''
ELSE ' ' + RTRIM(e.surname)
END As 'displayName',
RTRIM(e.known_as) As 'givenName',
RTRIM(e.surname) As 'sn',
RTRIM(c.job_title) As 'title',
RTRIM(c.division) As 'company',
RTRIM(c.department) As 'department',
RTRIM(l.description) As 'physicalDeliveryOfficeName',
RTRIM(REPLACE(am.dn, '\\', '\')) As 'manager',
t.full_mobile
+ CASE
WHEN RTRIM(t.mobile_number) IS NULL THEN ''
ELSE ' (DD: ' + RTRIM(t.mobile_number) + ')'
END As 'mobile',
t.mobile_number As 'otherMobile',
ad.address_ad_country As 'c',
ad.address_ad_address1
+ CASE
WHEN ad.address_ad_address2 IS NULL THEN ''
ELSE ', ' + ad.address_ad_address2
END
+ CASE
WHEN ad.address_ad_address3 IS NULL THEN ''
ELSE ', ' + ad.address_ad_address3
END
+ CASE
WHEN ad.address_ad_address4 IS NULL THEN ''
ELSE ', ' + ad.address_ad_address4
END
+ CASE
WHEN ad.address_ad_address5 IS NULL THEN ''
ELSE ', ' + ad.address_ad_address5
END As 'streetAddress',
ad.address_ad_pobox As 'postOfficeBox',
ad.address_ad_city As 'l',
ad.address_ad_County As 'st',
ad.address_ad_postcode As 'postalCode',
RTRIM(ad.address_ad_telephone)
+ CASE
WHEN RTRIM(a.othertelephone) IS NOT NULL AND RTRIM(ad.address_ad_telephone) IS NOT NULL THEN ' (Ext: ' + RTRIM(a.othertelephone) + ')'
WHEN RTRIM(a.othertelephone)IS NOT NULL AND RTRIM(ad.address_ad_telephone) IS NULL THEN 'Ext: ' + RTRIM(a.othertelephone)
ELSE ''
END As 'telephoneNumber'
FROM employee e
INNER JOIN AD_Import a ON a.employeeID = e.unique_identifier
LEFT JOIN AD_Telephone t ON t.unique_identifier = e.unique_identifier
LEFT JOIN AD_Address ad ON ad.address_pwa_location = e.location
LEFT JOIN career c ON c.parent_identifier = e.unique_identifier
AND CONVERT(CHAR(19), career_date, 120) + CONVERT(CHAR(19), datetime_created, 120) = (
SELECT c2.parent_identifier,
MAX(CONVERT(CHAR(19), c2.career_date, 120) + CONVERT(CHAR(19), c2.datetime_created, 120))
FROM pwa_master.career AS c2
WHERE c2.career_date <= GetDate()
AND c2.parent_identifier = c.parent_identifier
)
LEFT JOIN AD_Import am ON am.employeeNumber = c.manager_number
LEFT JOIN xlocat l ON l.code = c.location
WHERE a.employeeNumber IS NOT NULL|||Thanks for the attempt Peso, but unfortunately when I run it (after a wee bit of tweaking) I'm missing over 400 records.
I've decided to stick with my original, I figure that when it is put on the production server it will take less than half the time... The test servers are, how to hang an air freshener on this; crap
Mmmm, pine-fresh ;)
Thanks for your help everyone
No comments:
Post a Comment