Friday, March 30, 2012

Optimizing Query to Run

I'm trying to get a query to run which looks at completed orders that have not had another paid order in 180 days. The database I'm running it against is very large so I can't get it to complete. Where's what I've got:

select Date =cast(cl1.cl_rundate as datetime(102)),count(cl1.cl_recno) as 'Completed Initials', cl1.cl_status as Status from dbo.vw_Completedorders cl1 where cl1.lob_lineofbusiness = 'aaa'
and cl1.cl_rundate > '20060801' and not exists (
select cl2.cl_company from dbo.vw_Paidorders cl2 where
cl2.lob_lineofbusiness = 'aa'and cl2.cl_company = cl1.cl_order and cl2.cl_rundate > '20060101' and datediff(day,cl2.cl_rundate,cl1.cl_rundate) < 180)
group by cl1.cl_status, cl1.cl_rundate

Aragon:

I started to do a mock-up of this; however, I quickly decided that without additional information this was not a good idea. Since this database is "very large" and you "can't get it complete" I think it is a good idea to try to get more information. What I see as critical to this query is:

Avoiding table scans -- especially on the view that is included in the "not exists" condition|||Unfortunately in both Views, there are no indexes available. I do have a table I could use, Orders. In the Orders table I have 5 indexes available. The only one pertainable to my search is the recNo field. Unfortunately RunDate is not an index or it may actually work.

Both views references 2 tables, orders and lineofbussines. LineOfBussiness has 3 columns and no indexes.

The main table to reference, orders, has approximately 29k rows per day for each day since late 2004. CompletedOrders has about 11k per day for the same period and PaidOrders has about 10k per day.

Does this give you enough background?

|||

Try running the Database Tuning Advisor in SQL Server 2005 or Index Tuning Wizard in SQL Server 2000. You need to use a workload or script that contains this query and other common queries/DML statements. DTA/ITW can then recommend additional indexes that will help. It is hard to tell by just looking at your query where the problem might be. Both the objects are views so we don't know what tables are being referenced or how the views are defined. In addition to this, the indexes on the base tables referenced by the views also matter.

Btw, you are using invalid cast specification for datetime "cast(cl1.cl_rundate as datetime(102))". The code will fail if you upgrade your server to SQL Server 2005. The correct way is to do "cast(cl1.cl_rundate as datetime)". And it is also not clear why you need the cast assuming that the column is smalldatetime/datetime.

|||

Aragon:

This is probably enough for a mock-up, but I have no more time today. The big question is that if you can prove that you need the indexes on the tables or the views will you be allowed to add them?

Dave

|||Thanks for help. Unfortunately, what I really need for indexing is the date field. We tried to index on date before and the database had issues. The rebuild of indexes took too long to load. We have to wait for mainframe jobs to run prior to rebuilding so indexing ran into the work day and caused sych issues with other db's.

the date field is ANSI, so that's why I convert it to a datetime.
|||

Got it. Another piece of optimization that will help is to partition the data; sorry for neglecting to mention this in the previous pass. I will see if I can mock this up this morning.

Dave

|||I got it to run using recno to limit the results but it appears to be giving faulty data. I'm having counts of 500 on days when there were only 100 total.
|||

Aragon:

I finally got some time today to look at this again. I was able to load a bunch of mock-up data and look at the results. Below is the schema that I used for my mock-up along with some questions. I targeted my mock-up for about 10,000,000 rows which is 29K records per day times 365 days.

Dave


-- -
-- Mock-up schema?
--
--
-- QUESTIONS:
--
--
-- 1. I am not sure about the "CL_RUNDATE" field. I have modeled this field as
-- CHAR (10) storing date as '2006.05.21'. Exactly what kind of data element
-- is the "CL_RUNDATE" field?
-- 2. I am not sure how the views join the Orders table and the LineOfBusiness table.
-- How are these two table joined.
-- 3. Your query filters based on the field "LOB_LINEOFBUSINESS"; does that field of
-- the view come from the ORDERS table or the LINEOFBUSINESS table?
--
-- 4. In your overview you state that you "look at completed orders that have not had
-- another paid order in 180 days." However, in your query you alias view
-- VW_paidOrders with cl2 and look for orders > '20060101' while aliasing view
-- VW_completedOrders with cl1 and looking for orders > '20060801'. This seems
-- backwards.
-- 5. When you take the date difference DATEDIFF (day, cl2.cl_runDate, cl1.cl_runDate)
-- and this also seems backwards.
--
-- 6. When you join the two views you join the PaidOrders "cl_company" column to the
-- completedOrders "cl_order column. Are these two columns simply two different
-- instances of a representation of a company?
--
-- 7. Another question that I saw has to do with this issue:
--
-- Company A Order 1 Paid 1/15/2006
-- Company A Order 2 Completed 2/2/2006
-- Company A Order 3 Completed 9/9/2006
--
-- Assuming that 4/2 is 180 days ago we have Order 2 that precedes 4/2 and
-- has no subsequent PAID order. Should this cause to the 2/2 record to be
-- included in the report or should that fact that there was a 9/9 COMPLETED
-- order after the 2/2 order suppress the 2/2 order from being included?
--
-- --
-- Comments:
-- --
--
-- At this point I have put together a "working mockup"; however, I still need these
-- issues above discussed. I tried out a mock-up of 10,000,000 rows. This requires
-- 10 GB of disk space to store and about 10 GB of transient log space to run the
-- process to generate the data.
--
-- The index that I created is intended to access target data without the use of any
-- bookmark lookups. When I benchmarked the first set of tests this index reduced
-- logical read from 200000 to 296 -- a very large I/O reduction; this is a good
-- starting point but still doesn't yet prove anything. I am for the moment somewhat
-- optimistic.
--
-- None of this addresses partitioning of data.
--
-- -
--
-- Tables:
--
-- MockOrder
-- LineOfBusiness
--
-- Views:
--
-- vw_CompletedOrders
-- vw_PaidOrders
-- -
create table dbo.mockOrder
( cl_recno integer identity
constraint pk_mockOrder primary key,

lob_code char (4) not null,
cl_status char (2) not null,
cl_rundate char(10) not null,
cl_company char (30) not null,
filler char (730) not null
)
go

create index lobCompany_ndx
on mockOrder (cl_company, lob_code, cl_status, cl_rundate)
go


create table dbo.LineOfBusiness
( lob_code char (4) not null
constraint pk_lineOfBusiness primary key,

lob_lineOfBusiness char (4) not null
)
go


create view dbo.vw_Completedorders
as
select cl_recno,
a.lob_code,
b.lob_lineOfBusiness,
a.cl_status,
cl_rundate,
cl_company as cl_order
from dbo.mockOrder a
inner join dbo.lineOfBusiness b
on a.cl_status = 'C'
and a.lob_code = b.lob_code
go


create view dbo.vw_paidOrders
as
select cl_recno,
a.lob_code,
b.lob_lineOfBusiness,
a.cl_status,
cl_rundate,
cl_company
from dbo.mockOrder a
inner join dbo.lineOfBusiness b
on a.cl_status = 'P'
and a.lob_code = b.lob_code

|||

Sorry, I missed an index:

create index lobStatus_ndx
on mockOrder (lob_code, cl_status, cl_rundate, cl_company)

|||Wow, that's a lot of work you put into it. Thanks. Answers below.

1. I am not sure about the "CL_RUNDATE" field. I have modeled this field as
-- CHAR (10) storing date as '2006.05.21'. Exactly what kind of data element is the "CL_RUNDATE" field?

A: Ansi date format. Ex, 20060930

-- 2. I am not sure how the views join the Orders table and the LineOfBusiness table.
-- How are these two table joined.

A: There is a sort code field in both tables. LineOfBusiness is a grouping of sort codes.

-- 3. Your query filters based on the field "LOB_LINEOFBUSINESS"; does that field of
-- the view come from the ORDERS table or the LINEOFBUSINESS table?
--

A: It originates in the LOB table, but is added to the records in both Views.

-- 4. In your overview you state that you "look at completed orders that have not had
-- another paid order in 180 days." However, in your query you alias view
-- VW_paidOrders with cl2 and look for orders > '20060101' while aliasing view
-- VW_completedOrders with cl1 and looking for orders > '20060801'. This seems
-- backwards.

A: I think I mispoke. I'm looking for NEW orders completed when there hasn't been an OLD order the last 180 days. I did forget, in the above example to include a cl2.cl_rundate >= cl1.cl_rundate clause.

-- 5. When you take the date difference DATEDIFF (day, cl2.cl_runDate, cl1.cl_runDate)
-- and this also seems backwards.
--

A: See above. But it shouldn't matter, should it?

-- 6. When you join the two views you join the PaidOrders "cl_company" column to the
-- completedOrders "cl_order column. Are these two columns simply two different
-- instances of a representation of a company?
--

A: I mistyped. Both are company

-- 7. Another question that I saw has to do with this issue:
--
-- Company A Order 1 Paid 1/15/2006
-- Company A Order 2 Completed 2/2/2006
-- Company A Order 3 Completed 9/9/2006
--
-- Assuming that 4/2 is 180 days ago we have Order 2 that precedes 4/2 and
-- has no subsequent PAID order. Should this cause to the 2/2 record to be
-- included in the report or should that fact that there was a 9/9 COMPLETED
-- order after the 2/2 order suppress the 2/2 order from being included?

A: In the above example I would hope to select orders 1 and 3.
|||

Aragon:

Thanks for the answers. Here are the test cases that I set up. Please look these over and let me which of these test cases is false:

set nocount on
truncate table mockOrder

select convert (varchar (10), dateadd (day, -180, getdate()), 112)
-- Results: 20060406

-- --
-- CASE 1:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 01' 0011 'AAA' 'C' 20060401
--
-- This row is returned because:
-- (1) LOB = 'AAA'
-- (2) Status = 'C'
-- (3) Run Date (20060401) < 20060406
-- (4) There is no subsequent PAID record
-- --
insert into mockOrder values ( 11, 'AAA', 'C', '20060401', 'Company 01', ' ')

-- --
-- CASE 2:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 02' 0021 'AAA' 'C' 20060501
--
-- No rows returned because rundate > 20060406
-- --
insert into mockOrder values ( 21, 'AAA', 'C', '20060501', 'Company 02', ' ')


-- --
-- CASE 3:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 03' 0031 'BBB' 'C' 20060401
--
-- No rows returned because LOB is 'BBB'
-- --
insert into mockOrder values ( 31, 'BBB', 'C', '20060401', 'Company 03', ' ')


-- --
-- CASE 4:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 04' 0041 'AAA' 'C' 20060402
-- 'COMPANY 04' 0042 'AA' 'P' 20060430
--
-- No rows returned; although there is a qualifying 'C' record, there is
-- a subsequent 'P' record that disqualifies the qualifying 'C' record
-- --
insert into mockOrder values ( 41, 'AAA', 'C', '20060402', 'Company 04', ' ')
insert into mockOrder values ( 42, 'AA', 'P', '20060430', 'Company 04', ' ')


-- --
-- CASE 5:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 05' 0051 'AAA' 'C' 20060102
-- 'COMPANY 05' 0052 'AA' 'P' 20060131
-- 'COMPANY 05' 0053 'AAA' 'C' 20060401
--
-- The row for Rec No 53 is returned because:
-- (1) LOB = 'AAA'
-- (2) Status = 'C'
-- (3) Run Date (20060401) < 20060406
-- (4) There is no subsequent PAID record
-- --
insert into mockOrder values ( 51, 'AAA', 'C', '20060102', 'Company 05', ' ')
insert into mockOrder values ( 52, 'AA', 'P', '20060131', 'Company 05', ' ')
insert into mockOrder values ( 53, 'AAA', 'C', '20060401', 'Company 05', ' ')


-- --
-- CASE 6:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 06' 0061 'AAA' 'C' 20060102
-- 'COMPANY 06' 0062 'AA' 'P' 20060131
-- 'COMPANY 06' 0063 'AAA' 'C' 20060501
--
-- No Rows returned; (1) Rec No 61 would otherwise qualify, but Rec NO 62
-- is a subsequent 'P' record and therefore Rec No 61 does not qualify.
-- Rec No 63 does not qualify because Run Date (20060501) > 20060406
-- --
insert into mockOrder values ( 61, 'AAA', 'C', '20060102', 'Company 06', ' ')
insert into mockOrder values ( 62, 'AA', 'P', '20060131', 'Company 06', ' ')
insert into mockOrder values ( 63, 'AAA', 'C', '20060501', 'Company 06', ' ')


-- --
-- CASE 7:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 07' 0071 'AAA' 'C' 20060102
-- 'COMPANY 07' 0072 'AAA' 'C' 20060531
--
-- Rec No 71 is returned because it has no matching 'P' record
-- --
insert into mockOrder values ( 71, 'AAA', 'C', '20060102', 'Company 07', ' ')
insert into mockOrder values ( 73, 'AAA', 'C', '20060531', 'Company 07', ' ')


-- --
-- CASE 8:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 08' 0081 'AAA' 'C' 20060101
-- 'COMPANY 08' 0062 'AA' 'P' 20060901
--
-- Row included; the qualifying 'C' record does not have a matching 'P'
-- record for 180 days.
-- --
insert into mockOrder values ( 81, 'AAA', 'C', '20060101', 'Company 08', ' ')
insert into mockOrder values ( 82, 'AA', 'P', '20060901', 'Company 08', ' ')


-- --
-- CASE 9:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 09' 0091 'AAA' 'C' 20060101
-- 'COMPANY 09' 0091 'AAA' 'C' 20060515
-- 'COMPANY 08' 0093 'AA' 'P' 20060901
--
-- Row included; Rec No 91 qualifies and does not get a 'P' record until
-- more than 180 days later
-- --
insert into mockOrder values ( 91, 'AAA', 'C', '20060102', 'Company 09', ' ')
insert into mockOrder values ( 92, 'AAA', 'C', '20060515', 'Company 09', ' ')
insert into mockOrder values ( 93, 'AA', 'P', '20060901', 'Company 09', ' ')


-- --
-- CASE 10:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 10' 0101 'AAA' 'C' 20060101
-- 'COMPANY 10' 0102 'AAA' 'C' 20060105
-- 'COMPANY 10' 0103 'AAA' 'C' 20060515
-- 'COMPANY 10' 0104 'AA' 'P' 20060901
--
-- 2 Row included; Rec No 91 qualifies and does not get a 'P' record until
-- more than 180 days later
-- --
insert into mockOrder values (101, 'AAA', 'C', '20060102', 'Company 10', ' ')
insert into mockOrder values (102, 'AAA', 'C', '20060105', 'Company 10', ' ')
insert into mockOrder values (103, 'AAA', 'C', '20060515', 'Company 10', ' ')
insert into mockOrder values (104, 'AA', 'P', '20060901', 'Company 10', ' ')


-- --

select cl_company,
cl_recno,
lob_sort_code,
cl_status,
cl_rundate
from mockOrder
order by cl_recno


--

-- cl_company cl_recno lob_sort_code cl_status cl_rundate
-- -- - -
-- Company 01 11 AAA C 20060401
-- Company 02 21 AAA C 20060501
-- Company 03 31 BBB C 20060401
--
-- Company 04 41 AAA C 20060402
-- Company 04 42 AA P 20060430
--
-- Company 05 51 AAA C 20060102
-- Company 05 52 AA P 20060131
-- Company 05 53 AAA C 20060401
--
-- Company 06 61 AAA C 20060102
-- Company 06 62 AA P 20060131
-- Company 06 63 AAA C 20060501
--
-- Company 07 71 AAA C 20060102
-- Company 07 73 AAA C 20060531
--
-- Company 08 81 AAA C 20060101
-- Company 08 82 AA P 20060901
--
-- Company 09 91 AAA C 20060102
-- Company 09 92 AAA C 20060515
-- Company 09 93 AA P 20060901
--
-- Company 10 101 AAA C 20060102
-- Company 10 102 AAA C 20060105
-- Company 10 103 AAA C 20060515
-- Company 10 104 AA P 20060901
--
--

After inserting the test data described above, I ran the following with the indicated results:


declare @.compDate char (8)
declare @.startDate char (8)
declare @.paidDate char (8)
set @.compDate = convert (varchar (8), dateadd (day, -180, getdate()), 112)
set @.startDate = '20060101'
set @.paidDate = convert (varchar (8), dateadd (day, 180, @.startdate), 112)

-- --
-- Display dates that are important to the selection process
-- --
select @.compDate as [@.compDate],
@.startDate as [@.startDate],
@.paidDate as [paidDate]

-- @.compDate @.startDate paidDate
-- - --
-- 20060406 20060101 20060630


-- --
-- Display the records that make it through the filtering process
-- --
select cl_company,
cl_recno,
lob_LineOfBusiness,
cl_rundate
from dbo.vw_completedOrders cmp
where lob_lineOfBusiness = 'AAA'
and cl_rundate >= @.startDate
and cl_rundate <= '20060406'
and not exists
( select pd.cl_company
from dbo.vw_paidOrders pd
where pd.cl_company = cmp.cl_company
and pd.cl_rundate > cmp.cl_rundate
and datediff (day, cmp.cl_rundate, pd.cl_rundate) <= 180
and pd.lob_lineOfBusiness = 'AA'
)

-- cl_company cl_recno lob_LineOfBusiness cl_rundate
-- -- -
-- Company 01 11 aaa 20060401
-- Company 05 53 aaa 20060401
-- Company 07 71 aaa 20060102
-- Company 08 81 aaa 20060101
-- Company 09 91 aaa 20060102
-- Company 10 101 aaa 20060102
-- Company 10 102 aaa 20060105


-- --
-- Display the summary
-- --
select Date = cast (cmp.cl_rundate as datetime (102)),
count(cmp.cl_recno) as 'Completed Initials',
cmp.cl_status as Status
from dbo.vw_completedOrders cmp
where lob_lineOfBusiness = 'AAA'
and cl_rundate >= @.startDate
and cl_rundate <= '20060406'
and not exists
( select pd.cl_company
from dbo.vw_paidOrders pd
where pd.cl_company = cmp.cl_company
and pd.cl_rundate > cmp.cl_rundate
and datediff (day, cmp.cl_rundate, pd.cl_rundate) <= 180
and pd.lob_lineOfBusiness = 'AA'
)
group by cmp.cl_status, cmp.cl_rundate

-- Date Completed Initials Status
--
-- 2006-01-01 00:00:00.000 1 C
-- 2006-01-02 00:00:00.000 3 C
-- 2006-01-05 00:00:00.000 1 C
-- 2006-04-01 00:00:00.000 2 C

Let me know if this looks close.

Dave

No comments:

Post a Comment