Showing posts with label run. Show all posts
Showing posts with label run. Show all posts

Friday, March 30, 2012

Optimizing Store Procedure and Avoiding Cursors....

I'm trying to optimize an ETL batch procedure which takes an extremely long
time to run. I was using cursors for most part of the batch script on
different Store Procedures. Now I'm trying to remove the cursors and replace
them with alternative methods such as using temp tables.
I'm really puzzled as to why the cursor procedure runs much faster than
using temp tables... can anyone point out what is wrong to me?
Below is a snippet of 2 test SPs I've compared against.
TestTable holds 200000 records with 3 columns
1. Using Cursors (This procedure took 4 - 8 secs to run)
declare @.test as numeric
declare @.header as nvarchar(50)
declare @.body as nvarchar(50)
declare @.footer as nvarchar(50)
declare test_cursor cursor for select header, body, footer from testTable
order by header
open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
set @.test = 0
while @.@.fetch_status = 0
begin
set @.test = @.test + 1
fetch next from test_cursor into @.header, @.body, @.footer
end
select @.test
close test_cursor
deallocate test_cursor
2. Using Temp Tables (This procedure took > 1 min)
declare @.test as numeric
declare @.header as nvarchar(50)
declare @.body as nvarchar(50)
declare @.footer as nvarchar(50)
declare @.count int, @.loopcount int
select identity(int,1,1) id, header, body, footer into #temp from testtable
select @.count = @.@.rowcount
set @.loopcount = 1
while @.loopcount <= @.count
begin
select @.header = header, @.body = body, @.footer = footer from
#temp where id = @.loopcount
set @.loopcount = @.loopcount + 1
end
drop table #tempLooking at the code I think it maybe the the where clause in the tempdb,
rember that it has to search through all 200000 records whereas with the
cursor your going through them from start to finish, no searching.
Peter
"Happiness is nothing more than good health and a bad memory."
Albert Schweitzer
"Nestor" wrote:

> I'm trying to optimize an ETL batch procedure which takes an extremely lon
g
> time to run. I was using cursors for most part of the batch script on
> different Store Procedures. Now I'm trying to remove the cursors and repla
ce
> them with alternative methods such as using temp tables.
> I'm really puzzled as to why the cursor procedure runs much faster than
> using temp tables... can anyone point out what is wrong to me?
> Below is a snippet of 2 test SPs I've compared against.
> TestTable holds 200000 records with 3 columns
> 1. Using Cursors (This procedure took 4 - 8 secs to run)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare test_cursor cursor for select header, body, footer from testTable
> order by header
> open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
> set @.test = 0
> while @.@.fetch_status = 0
> begin
> set @.test = @.test + 1
> fetch next from test_cursor into @.header, @.body, @.footer
> end
> select @.test
> close test_cursor
> deallocate test_cursor
>
> 2. Using Temp Tables (This procedure took > 1 min)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare @.count int, @.loopcount int
> select identity(int,1,1) id, header, body, footer into #temp from testtab
le
> select @.count = @.@.rowcount
> set @.loopcount = 1
> while @.loopcount <= @.count
> begin
> select @.header = header, @.body = body, @.footer = footer from
> #temp where id = @.loopcount
> set @.loopcount = @.loopcount + 1
> end
> drop table #temp
>
>
>|||Both 1 and 2 are cursors! Looping through a temp table is just a cursor
in disguise. Unfortunately your sample code doesn't do anything useful
so it's impossible to suggest an alternative set-based solution without
using cursors. Your goal should be to avoid processing one row at a
time by ANY method.
If you need more help please post DDL, sample data INSERTs and show
your required end result.
http://www.aspfaq.com/etiquette.asp?id=5006
David Portas
SQL Server MVP
--|||Hello Nestor,
Going through your code, I dont see any reason why the #temp based approach
should be any better
than the Cursor based approach (if it is supposed to be).
The #temp based approach, Loops anyway with a while Loop which is simillar
to Cursor based method.
Normally, when we say we have to get rid of the cursor, we mean to replace
the Cursor based approach
with a SET BASED APPROACH and your #temp based approach still does record by
record. And as Peter says
for each iteration a select is executed against the table.
Hope this helps.
Gopi
"Nestor" <n3570r@.yahoo.com> wrote in message
news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
> I'm trying to optimize an ETL batch procedure which takes an extremely
> long time to run. I was using cursors for most part of the batch script on
> different Store Procedures. Now I'm trying to remove the cursors and
> replace them with alternative methods such as using temp tables.
> I'm really puzzled as to why the cursor procedure runs much faster than
> using temp tables... can anyone point out what is wrong to me?
> Below is a snippet of 2 test SPs I've compared against.
> TestTable holds 200000 records with 3 columns
> 1. Using Cursors (This procedure took 4 - 8 secs to run)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare test_cursor cursor for select header, body, footer from testTable
> order by header
> open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
> set @.test = 0
> while @.@.fetch_status = 0
> begin
> set @.test = @.test + 1
> fetch next from test_cursor into @.header, @.body, @.footer
> end
> select @.test
> close test_cursor
> deallocate test_cursor
>
> 2. Using Temp Tables (This procedure took > 1 min)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare @.count int, @.loopcount int
> select identity(int,1,1) id, header, body, footer into #temp from
> testtable
> select @.count = @.@.rowcount
> set @.loopcount = 1
> while @.loopcount <= @.count
> begin
> select @.header = header, @.body = body, @.footer = footer from
> #temp where id = @.loopcount
> set @.loopcount = @.loopcount + 1
> end
> drop table #temp
>
>|||Although I haven't tested it, I'd expect the pseudo-cursor would perform
much better if you add a unique clustered index on the #temp id column.
However, as David pointed out, changing a cursor to another iterative
technique isn't the ideal solution and might not be any faster than the
cursor.
We can't make any specific recommendations because the example you posted
does no productive work. If your actual process does something like execute
another proc for each row returned, the set-based solution would be to use
in-line set-based processing instead. This technique is often orders of
magnitude faster than an iterative approach.
Hope this helps.
Dan Guzman
SQL Server MVP
"Nestor" <n3570r@.yahoo.com> wrote in message
news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
> I'm trying to optimize an ETL batch procedure which takes an extremely
> long time to run. I was using cursors for most part of the batch script on
> different Store Procedures. Now I'm trying to remove the cursors and
> replace them with alternative methods such as using temp tables.
> I'm really puzzled as to why the cursor procedure runs much faster than
> using temp tables... can anyone point out what is wrong to me?
> Below is a snippet of 2 test SPs I've compared against.
> TestTable holds 200000 records with 3 columns
> 1. Using Cursors (This procedure took 4 - 8 secs to run)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare test_cursor cursor for select header, body, footer from testTable
> order by header
> open test_cursor fetch next from test_cursor into @.header, @.body, @.footer
> set @.test = 0
> while @.@.fetch_status = 0
> begin
> set @.test = @.test + 1
> fetch next from test_cursor into @.header, @.body, @.footer
> end
> select @.test
> close test_cursor
> deallocate test_cursor
>
> 2. Using Temp Tables (This procedure took > 1 min)
> declare @.test as numeric
> declare @.header as nvarchar(50)
> declare @.body as nvarchar(50)
> declare @.footer as nvarchar(50)
> declare @.count int, @.loopcount int
> select identity(int,1,1) id, header, body, footer into #temp from
> testtable
> select @.count = @.@.rowcount
> set @.loopcount = 1
> while @.loopcount <= @.count
> begin
> select @.header = header, @.body = body, @.footer = footer from
> #temp where id = @.loopcount
> set @.loopcount = @.loopcount + 1
> end
> drop table #temp
>
>|||Dan,
Could you please share with me what "in-line set-based " means.
I was looking at a Stored Proc which has a nested cursors (One cursor inside
another cursor)
and the While loop of the second a SP is called with the two parameters
(each of them coming
from each of the Cursor code.
Here is the PSeudo-Code :
Declare Cursor Cursor1 Select * from table1
Open Cursor1
Fetch into @.var1
While @.@.FETCH_STATUS <> 1
Declare Cursor Cursor2 Select * from table2
Open Cursor2
Fetch into @.var2
While @.@.FETCH_STATUS <> 1
Exec Spname @.var1, @.var2
........
.....
DEALLOCATE Cursor1
DEALLOCATE Cursor2
I have noticed that the two cursors can be combined into one using
appropriate Joins.
However, I was under the impression I will still have to use a Cursor for
executing the SP
Could you please point me in the right direction if there is a way out.
Thanks,
Gopi
<< If your actual process does something like execute another proc for each
row returned, the set-based
solution would be to use in-line set-based processing instead>>
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:OXOup8KJFHA.4092@.tk2msftngp13.phx.gbl...
> Although I haven't tested it, I'd expect the pseudo-cursor would perform
> much better if you add a unique clustered index on the #temp id column.
> However, as David pointed out, changing a cursor to another iterative
> technique isn't the ideal solution and might not be any faster than the
> cursor.
> We can't make any specific recommendations because the example you posted
> does no productive work. If your actual process does something like
> execute another proc for each row returned, the set-based solution would
> be to use in-line set-based processing instead. This technique is often
> orders of magnitude faster than an iterative approach.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Nestor" <n3570r@.yahoo.com> wrote in message
> news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
>|||> However, I was under the impression I will still have to use a Cursor
for
> executing the SP
Yes, using an SP in this way forces you to use the iterative row-by-row
approach. The set-based solution is to perform whatever logic is
contained in SPname using a query against your tables so that the
entire operation is performed against the data set rather than one row
at a time. For data manipulation operations that usually has advantages
over the cursor in terms of performance, scalability and
maintainability.
David Portas
SQL Server MVP
--|||> Here is the PSeudo-Code :
Your pseudo-code effectively calls the proc for each row returned by the
Cartesian product of table1 and table2. Assuming that the executed proc
simply inserts into a third table, a working code example:
CREATE TABLE table1
(
Col1 int NOT NULL PRIMARY KEY
)
INSERT INTO table1 VALUES(1)
INSERT INTO table1 VALUES(2)
INSERT INTO table1 VALUES(3)
CREATE TABLE table2
(
Col1 int NOT NULL PRIMARY KEY
)
INSERT INTO table2 VALUES(4)
INSERT INTO table2 VALUES(5)
INSERT INTO table2 VALUES(6)
CREATE TABLE table3
(
Col1 int NOT NULL,
Col2 int NOT NULL,
PRIMARY KEY (Col1, Col2)
)
GO
CREATE PROC Spname
@.var1 int,
@.var2 int
AS
SET NOCOUNT ON
INSERT INTO Table3(Col1, Col2)
VALUES(@.var1, @.var2)
GO
DECLARE @.var1 int,
@.var2 int
DECLARE Cursor1 CURSOR
LOCAL FAST_FORWARD READ_ONLY FOR
SELECT * FROM table1
OPEN Cursor1
FETCH NEXT FROM Cursor1 INTO @.var1
WHILE @.@.FETCH_STATUS <> -1
BEGIN
DECLARE Cursor2 CURSOR LOCAL FOR SELECT * FROM table2
OPEN Cursor2
FETCH NEXT FROM Cursor2 INTO @.var2
WHILE @.@.FETCH_STATUS <> -1
BEGIN
EXEC Spname @.var1, @.var2
FETCH NEXT FROM Cursor2 INTO @.var2
END
CLOSE Cursor2
DEALLOCATE Cursor2
FETCH NEXT FROM Cursor1 INTO @.var1
END
CLOSE Cursor1
DEALLOCATE Cursor1
GO

> I have noticed that the two cursors can be combined into one using
> appropriate Joins.
Exactly. That's the first step.

> However, I was under the impression I will still have to use a Cursor for
> executing the SP
Yes you will. This is why I suggested including the equivalent proc code
inline *instead of* executing the proc for each row. The entire cursor
script and proc code in my functional example could be replaced with a
single insert statement:
INSERT INTO Table3(Col1, Col2)
SELECT t1.Col1, t2.Col1
FROM table1 t1
CROSS JOIN table2 t2
Of course, a lot depends on exactly what the called proc does. Also, your
actual code probably isn't a CROSS JOIN. A simple INNER JOIN example:
INSERT INTO Table3(Col1, Col2)
SELECT t1.Col1, t2.Col1
FROM table1 t1
JOIN table2 t2 ON
t1.Col1 = t2.SomeColumn
If you need more help, you need to provide details on what the proc actually
does. Actual code will help.
Hope this helps.
Dan Guzman
SQL Server MVP
"rgn" <gopinathr@.healthasyst.com> wrote in message
news:%23Sx92CLJFHA.1948@.TK2MSFTNGP14.phx.gbl...
> Dan,
> Could you please share with me what "in-line set-based " means.
> I was looking at a Stored Proc which has a nested cursors (One cursor
> inside another cursor)
> and the While loop of the second a SP is called with the two parameters
> (each of them coming
> from each of the Cursor code.
> Here is the PSeudo-Code :
> Declare Cursor Cursor1 Select * from table1
> Open Cursor1
> Fetch into @.var1
> While @.@.FETCH_STATUS <> 1
>
> Declare Cursor Cursor2 Select * from table2
> Open Cursor2
> Fetch into @.var2
> While @.@.FETCH_STATUS <> 1
> Exec Spname @.var1, @.var2
> ........
> .....
> DEALLOCATE Cursor1
> DEALLOCATE Cursor2
> I have noticed that the two cursors can be combined into one using
> appropriate Joins.
> However, I was under the impression I will still have to use a Cursor for
> executing the SP
> Could you please point me in the right direction if there is a way out.
> Thanks,
> Gopi
> << If your actual process does something like execute another proc for
> each row returned, the set-based
> solution would be to use in-line set-based processing instead>>
>
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:OXOup8KJFHA.4092@.tk2msftngp13.phx.gbl...
>|||Thanks for pointing that out. I've got the code base from an article in
http://www.extremeexperts.com/SQL/A...TSQLResult.aspx
Am I doing it wrong with instructions from that article? In any case, what
exactly is SET BASED APPROACH? I do realise and know that SQL Server is
meant to be relational and not sequential, but I need to understand more on
how to apply the alternative first before trying to overhaul the scripts
running in the production servers.
Can anyone give me an equivalent of the cursor procedure I posted using SET
BASE APPROACH so that I can better relate?
Thanks.
Regards,
Nestor
"rgn" <gopinathr@.healthasyst.com> wrote in message
news:e$w%23rZJJFHA.2604@.TK2MSFTNGP15.phx.gbl...
> Hello Nestor,
> Going through your code, I dont see any reason why the #temp based
> approach should be any better
> than the Cursor based approach (if it is supposed to be).
> The #temp based approach, Loops anyway with a while Loop which is simillar
> to Cursor based method.
> Normally, when we say we have to get rid of the cursor, we mean to replace
> the Cursor based approach
> with a SET BASED APPROACH and your #temp based approach still does record
> by record. And as Peter says
> for each iteration a select is executed against the table.
> Hope this helps.
> Gopi
>
> "Nestor" <n3570r@.yahoo.com> wrote in message
> news:ORU5F$IJFHA.1172@.TK2MSFTNGP12.phx.gbl...
>|||The article demonstrates some techniques for iteration but doesn't give
any recommendations as to when to use them. 99.99% of the time
row-by-row processing isn't necessary so those techniques shouldn't be
required. The Set Based approach means that you use SQL DML statements
(basically SELECT, UPDATE, INSERT, DELETE) that operate on a SET of
rows at a time rather than ONE row at a time. That should always be the
standard, preferred way to write code.

> Can anyone give me an equivalent of the cursor procedure I posted
using SET
> BASE APPROACH so that I can better relate?
But your cursor didn't do anything except assign a few variables. How
can we show you how to replace it if we don't know what it's supposed
to do? What we need to know is what you are actually trying to achieve
then we can suggest the right solution. It may even be that there isn't
a set-based solution to your problem - that's rarely the case but we
can't tell you until we understand the problem.
The following article explains the best way to post your problem here:
http://www.aspfaq.com/etiquette.asp?id=5006
David Portas
SQL Server MVP
--

Optimizing SQL Query

Hello,
I have to run a script which will delete records from 10 tables
based on a certain condition. Instead of having to run the condition 10
times I thought running it once would have better performance and did
something like this
CREATE PROCEDURE testScript AS
set nocount on
declare @.script_table table (row_id int identity(1,1), sid int)
declare @.max int, @.now int, @.result varchar(100)
insert into @.script_table
select sid from Alpha where lname like 'DOWN' and (fname like 'Tom' or
fname like 'Seinfeld')
select @.max = max(row_id) from @.script_table
select @.now=2, @.result=convert(varchar(6), sid) from @.script_table
where row_id=1
while @.max >=@.now
Begin
select @.result = @.result + ',' + convert(varchar(6), sid) from
@.script_table where row_id=@.now
set @.now=@.now + 1;
End
select * from Beta where convert(varchar(5), sid) in ( @.result )
if @.@.error <> 0
BEGIN
ROLLBACK Transaction
RAISERROR ( 'E', 2, 127)
END
...
...
...
but when I run this I dont get any values. Instead when I run this
query I get the output rows
select * from Beta where convert(varchar(5), sid) in (select sid from
Alpha where lname like 'DOWN' and (fname like 'Tom' or fname like
'Seinfeld'))
since @.result has the output from Alpha with a comma delimiter I was
under the impression that this should give me the result but instead I
dont get any rows. Is this because @.result is a varchar? Am I doing
something wrong here? Any help would be great..
Thanks
Khris> Is this because @.result is a varchar?
Yes. You can, however, parse a delimited string and have values returned in
a table, that you can use in your query.
My favourite solution is this one (by Dejan Sarka):
http://solidqualitylearning.com/blo.../10/22/200.aspx
ML
http://milambda.blogspot.com/|||Thanks ML, I will try that out|||Thanks ML I will try that out
Khris

Optimizing Query to Run

I'm trying to get a query to run which looks at completed orders that have not had another paid order in 180 days. The database I'm running it against is very large so I can't get it to complete. Where's what I've got:

select Date =cast(cl1.cl_rundate as datetime(102)),count(cl1.cl_recno) as 'Completed Initials', cl1.cl_status as Status from dbo.vw_Completedorders cl1 where cl1.lob_lineofbusiness = 'aaa'
and cl1.cl_rundate > '20060801' and not exists (
select cl2.cl_company from dbo.vw_Paidorders cl2 where
cl2.lob_lineofbusiness = 'aa'and cl2.cl_company = cl1.cl_order and cl2.cl_rundate > '20060101' and datediff(day,cl2.cl_rundate,cl1.cl_rundate) < 180)
group by cl1.cl_status, cl1.cl_rundate

Aragon:

I started to do a mock-up of this; however, I quickly decided that without additional information this was not a good idea. Since this database is "very large" and you "can't get it complete" I think it is a good idea to try to get more information. What I see as critical to this query is:

Avoiding table scans -- especially on the view that is included in the "not exists" condition|||Unfortunately in both Views, there are no indexes available. I do have a table I could use, Orders. In the Orders table I have 5 indexes available. The only one pertainable to my search is the recNo field. Unfortunately RunDate is not an index or it may actually work.

Both views references 2 tables, orders and lineofbussines. LineOfBussiness has 3 columns and no indexes.

The main table to reference, orders, has approximately 29k rows per day for each day since late 2004. CompletedOrders has about 11k per day for the same period and PaidOrders has about 10k per day.

Does this give you enough background?

|||

Try running the Database Tuning Advisor in SQL Server 2005 or Index Tuning Wizard in SQL Server 2000. You need to use a workload or script that contains this query and other common queries/DML statements. DTA/ITW can then recommend additional indexes that will help. It is hard to tell by just looking at your query where the problem might be. Both the objects are views so we don't know what tables are being referenced or how the views are defined. In addition to this, the indexes on the base tables referenced by the views also matter.

Btw, you are using invalid cast specification for datetime "cast(cl1.cl_rundate as datetime(102))". The code will fail if you upgrade your server to SQL Server 2005. The correct way is to do "cast(cl1.cl_rundate as datetime)". And it is also not clear why you need the cast assuming that the column is smalldatetime/datetime.

|||

Aragon:

This is probably enough for a mock-up, but I have no more time today. The big question is that if you can prove that you need the indexes on the tables or the views will you be allowed to add them?

Dave

|||Thanks for help. Unfortunately, what I really need for indexing is the date field. We tried to index on date before and the database had issues. The rebuild of indexes took too long to load. We have to wait for mainframe jobs to run prior to rebuilding so indexing ran into the work day and caused sych issues with other db's.

the date field is ANSI, so that's why I convert it to a datetime.
|||

Got it. Another piece of optimization that will help is to partition the data; sorry for neglecting to mention this in the previous pass. I will see if I can mock this up this morning.

Dave

|||I got it to run using recno to limit the results but it appears to be giving faulty data. I'm having counts of 500 on days when there were only 100 total.
|||

Aragon:

I finally got some time today to look at this again. I was able to load a bunch of mock-up data and look at the results. Below is the schema that I used for my mock-up along with some questions. I targeted my mock-up for about 10,000,000 rows which is 29K records per day times 365 days.

Dave


-- -
-- Mock-up schema?
--
--
-- QUESTIONS:
--
--
-- 1. I am not sure about the "CL_RUNDATE" field. I have modeled this field as
-- CHAR (10) storing date as '2006.05.21'. Exactly what kind of data element
-- is the "CL_RUNDATE" field?
-- 2. I am not sure how the views join the Orders table and the LineOfBusiness table.
-- How are these two table joined.
-- 3. Your query filters based on the field "LOB_LINEOFBUSINESS"; does that field of
-- the view come from the ORDERS table or the LINEOFBUSINESS table?
--
-- 4. In your overview you state that you "look at completed orders that have not had
-- another paid order in 180 days." However, in your query you alias view
-- VW_paidOrders with cl2 and look for orders > '20060101' while aliasing view
-- VW_completedOrders with cl1 and looking for orders > '20060801'. This seems
-- backwards.
-- 5. When you take the date difference DATEDIFF (day, cl2.cl_runDate, cl1.cl_runDate)
-- and this also seems backwards.
--
-- 6. When you join the two views you join the PaidOrders "cl_company" column to the
-- completedOrders "cl_order column. Are these two columns simply two different
-- instances of a representation of a company?
--
-- 7. Another question that I saw has to do with this issue:
--
-- Company A Order 1 Paid 1/15/2006
-- Company A Order 2 Completed 2/2/2006
-- Company A Order 3 Completed 9/9/2006
--
-- Assuming that 4/2 is 180 days ago we have Order 2 that precedes 4/2 and
-- has no subsequent PAID order. Should this cause to the 2/2 record to be
-- included in the report or should that fact that there was a 9/9 COMPLETED
-- order after the 2/2 order suppress the 2/2 order from being included?
--
-- --
-- Comments:
-- --
--
-- At this point I have put together a "working mockup"; however, I still need these
-- issues above discussed. I tried out a mock-up of 10,000,000 rows. This requires
-- 10 GB of disk space to store and about 10 GB of transient log space to run the
-- process to generate the data.
--
-- The index that I created is intended to access target data without the use of any
-- bookmark lookups. When I benchmarked the first set of tests this index reduced
-- logical read from 200000 to 296 -- a very large I/O reduction; this is a good
-- starting point but still doesn't yet prove anything. I am for the moment somewhat
-- optimistic.
--
-- None of this addresses partitioning of data.
--
-- -
--
-- Tables:
--
-- MockOrder
-- LineOfBusiness
--
-- Views:
--
-- vw_CompletedOrders
-- vw_PaidOrders
-- -
create table dbo.mockOrder
( cl_recno integer identity
constraint pk_mockOrder primary key,

lob_code char (4) not null,
cl_status char (2) not null,
cl_rundate char(10) not null,
cl_company char (30) not null,
filler char (730) not null
)
go

create index lobCompany_ndx
on mockOrder (cl_company, lob_code, cl_status, cl_rundate)
go


create table dbo.LineOfBusiness
( lob_code char (4) not null
constraint pk_lineOfBusiness primary key,

lob_lineOfBusiness char (4) not null
)
go


create view dbo.vw_Completedorders
as
select cl_recno,
a.lob_code,
b.lob_lineOfBusiness,
a.cl_status,
cl_rundate,
cl_company as cl_order
from dbo.mockOrder a
inner join dbo.lineOfBusiness b
on a.cl_status = 'C'
and a.lob_code = b.lob_code
go


create view dbo.vw_paidOrders
as
select cl_recno,
a.lob_code,
b.lob_lineOfBusiness,
a.cl_status,
cl_rundate,
cl_company
from dbo.mockOrder a
inner join dbo.lineOfBusiness b
on a.cl_status = 'P'
and a.lob_code = b.lob_code

|||

Sorry, I missed an index:

create index lobStatus_ndx
on mockOrder (lob_code, cl_status, cl_rundate, cl_company)

|||Wow, that's a lot of work you put into it. Thanks. Answers below.

1. I am not sure about the "CL_RUNDATE" field. I have modeled this field as
-- CHAR (10) storing date as '2006.05.21'. Exactly what kind of data element is the "CL_RUNDATE" field?

A: Ansi date format. Ex, 20060930

-- 2. I am not sure how the views join the Orders table and the LineOfBusiness table.
-- How are these two table joined.

A: There is a sort code field in both tables. LineOfBusiness is a grouping of sort codes.

-- 3. Your query filters based on the field "LOB_LINEOFBUSINESS"; does that field of
-- the view come from the ORDERS table or the LINEOFBUSINESS table?
--

A: It originates in the LOB table, but is added to the records in both Views.

-- 4. In your overview you state that you "look at completed orders that have not had
-- another paid order in 180 days." However, in your query you alias view
-- VW_paidOrders with cl2 and look for orders > '20060101' while aliasing view
-- VW_completedOrders with cl1 and looking for orders > '20060801'. This seems
-- backwards.

A: I think I mispoke. I'm looking for NEW orders completed when there hasn't been an OLD order the last 180 days. I did forget, in the above example to include a cl2.cl_rundate >= cl1.cl_rundate clause.

-- 5. When you take the date difference DATEDIFF (day, cl2.cl_runDate, cl1.cl_runDate)
-- and this also seems backwards.
--

A: See above. But it shouldn't matter, should it?

-- 6. When you join the two views you join the PaidOrders "cl_company" column to the
-- completedOrders "cl_order column. Are these two columns simply two different
-- instances of a representation of a company?
--

A: I mistyped. Both are company

-- 7. Another question that I saw has to do with this issue:
--
-- Company A Order 1 Paid 1/15/2006
-- Company A Order 2 Completed 2/2/2006
-- Company A Order 3 Completed 9/9/2006
--
-- Assuming that 4/2 is 180 days ago we have Order 2 that precedes 4/2 and
-- has no subsequent PAID order. Should this cause to the 2/2 record to be
-- included in the report or should that fact that there was a 9/9 COMPLETED
-- order after the 2/2 order suppress the 2/2 order from being included?

A: In the above example I would hope to select orders 1 and 3.
|||

Aragon:

Thanks for the answers. Here are the test cases that I set up. Please look these over and let me which of these test cases is false:

set nocount on
truncate table mockOrder

select convert (varchar (10), dateadd (day, -180, getdate()), 112)
-- Results: 20060406

-- --
-- CASE 1:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 01' 0011 'AAA' 'C' 20060401
--
-- This row is returned because:
-- (1) LOB = 'AAA'
-- (2) Status = 'C'
-- (3) Run Date (20060401) < 20060406
-- (4) There is no subsequent PAID record
-- --
insert into mockOrder values ( 11, 'AAA', 'C', '20060401', 'Company 01', ' ')

-- --
-- CASE 2:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 02' 0021 'AAA' 'C' 20060501
--
-- No rows returned because rundate > 20060406
-- --
insert into mockOrder values ( 21, 'AAA', 'C', '20060501', 'Company 02', ' ')


-- --
-- CASE 3:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 03' 0031 'BBB' 'C' 20060401
--
-- No rows returned because LOB is 'BBB'
-- --
insert into mockOrder values ( 31, 'BBB', 'C', '20060401', 'Company 03', ' ')


-- --
-- CASE 4:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 04' 0041 'AAA' 'C' 20060402
-- 'COMPANY 04' 0042 'AA' 'P' 20060430
--
-- No rows returned; although there is a qualifying 'C' record, there is
-- a subsequent 'P' record that disqualifies the qualifying 'C' record
-- --
insert into mockOrder values ( 41, 'AAA', 'C', '20060402', 'Company 04', ' ')
insert into mockOrder values ( 42, 'AA', 'P', '20060430', 'Company 04', ' ')


-- --
-- CASE 5:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 05' 0051 'AAA' 'C' 20060102
-- 'COMPANY 05' 0052 'AA' 'P' 20060131
-- 'COMPANY 05' 0053 'AAA' 'C' 20060401
--
-- The row for Rec No 53 is returned because:
-- (1) LOB = 'AAA'
-- (2) Status = 'C'
-- (3) Run Date (20060401) < 20060406
-- (4) There is no subsequent PAID record
-- --
insert into mockOrder values ( 51, 'AAA', 'C', '20060102', 'Company 05', ' ')
insert into mockOrder values ( 52, 'AA', 'P', '20060131', 'Company 05', ' ')
insert into mockOrder values ( 53, 'AAA', 'C', '20060401', 'Company 05', ' ')


-- --
-- CASE 6:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 06' 0061 'AAA' 'C' 20060102
-- 'COMPANY 06' 0062 'AA' 'P' 20060131
-- 'COMPANY 06' 0063 'AAA' 'C' 20060501
--
-- No Rows returned; (1) Rec No 61 would otherwise qualify, but Rec NO 62
-- is a subsequent 'P' record and therefore Rec No 61 does not qualify.
-- Rec No 63 does not qualify because Run Date (20060501) > 20060406
-- --
insert into mockOrder values ( 61, 'AAA', 'C', '20060102', 'Company 06', ' ')
insert into mockOrder values ( 62, 'AA', 'P', '20060131', 'Company 06', ' ')
insert into mockOrder values ( 63, 'AAA', 'C', '20060501', 'Company 06', ' ')


-- --
-- CASE 7:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 07' 0071 'AAA' 'C' 20060102
-- 'COMPANY 07' 0072 'AAA' 'C' 20060531
--
-- Rec No 71 is returned because it has no matching 'P' record
-- --
insert into mockOrder values ( 71, 'AAA', 'C', '20060102', 'Company 07', ' ')
insert into mockOrder values ( 73, 'AAA', 'C', '20060531', 'Company 07', ' ')


-- --
-- CASE 8:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 08' 0081 'AAA' 'C' 20060101
-- 'COMPANY 08' 0062 'AA' 'P' 20060901
--
-- Row included; the qualifying 'C' record does not have a matching 'P'
-- record for 180 days.
-- --
insert into mockOrder values ( 81, 'AAA', 'C', '20060101', 'Company 08', ' ')
insert into mockOrder values ( 82, 'AA', 'P', '20060901', 'Company 08', ' ')


-- --
-- CASE 9:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 09' 0091 'AAA' 'C' 20060101
-- 'COMPANY 09' 0091 'AAA' 'C' 20060515
-- 'COMPANY 08' 0093 'AA' 'P' 20060901
--
-- Row included; Rec No 91 qualifies and does not get a 'P' record until
-- more than 180 days later
-- --
insert into mockOrder values ( 91, 'AAA', 'C', '20060102', 'Company 09', ' ')
insert into mockOrder values ( 92, 'AAA', 'C', '20060515', 'Company 09', ' ')
insert into mockOrder values ( 93, 'AA', 'P', '20060901', 'Company 09', ' ')


-- --
-- CASE 10:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 10' 0101 'AAA' 'C' 20060101
-- 'COMPANY 10' 0102 'AAA' 'C' 20060105
-- 'COMPANY 10' 0103 'AAA' 'C' 20060515
-- 'COMPANY 10' 0104 'AA' 'P' 20060901
--
-- 2 Row included; Rec No 91 qualifies and does not get a 'P' record until
-- more than 180 days later
-- --
insert into mockOrder values (101, 'AAA', 'C', '20060102', 'Company 10', ' ')
insert into mockOrder values (102, 'AAA', 'C', '20060105', 'Company 10', ' ')
insert into mockOrder values (103, 'AAA', 'C', '20060515', 'Company 10', ' ')
insert into mockOrder values (104, 'AA', 'P', '20060901', 'Company 10', ' ')


-- --

select cl_company,
cl_recno,
lob_sort_code,
cl_status,
cl_rundate
from mockOrder
order by cl_recno


--

-- cl_company cl_recno lob_sort_code cl_status cl_rundate
-- -- - -
-- Company 01 11 AAA C 20060401
-- Company 02 21 AAA C 20060501
-- Company 03 31 BBB C 20060401
--
-- Company 04 41 AAA C 20060402
-- Company 04 42 AA P 20060430
--
-- Company 05 51 AAA C 20060102
-- Company 05 52 AA P 20060131
-- Company 05 53 AAA C 20060401
--
-- Company 06 61 AAA C 20060102
-- Company 06 62 AA P 20060131
-- Company 06 63 AAA C 20060501
--
-- Company 07 71 AAA C 20060102
-- Company 07 73 AAA C 20060531
--
-- Company 08 81 AAA C 20060101
-- Company 08 82 AA P 20060901
--
-- Company 09 91 AAA C 20060102
-- Company 09 92 AAA C 20060515
-- Company 09 93 AA P 20060901
--
-- Company 10 101 AAA C 20060102
-- Company 10 102 AAA C 20060105
-- Company 10 103 AAA C 20060515
-- Company 10 104 AA P 20060901
--
--

After inserting the test data described above, I ran the following with the indicated results:


declare @.compDate char (8)
declare @.startDate char (8)
declare @.paidDate char (8)
set @.compDate = convert (varchar (8), dateadd (day, -180, getdate()), 112)
set @.startDate = '20060101'
set @.paidDate = convert (varchar (8), dateadd (day, 180, @.startdate), 112)

-- --
-- Display dates that are important to the selection process
-- --
select @.compDate as [@.compDate],
@.startDate as [@.startDate],
@.paidDate as [paidDate]

-- @.compDate @.startDate paidDate
-- - --
-- 20060406 20060101 20060630


-- --
-- Display the records that make it through the filtering process
-- --
select cl_company,
cl_recno,
lob_LineOfBusiness,
cl_rundate
from dbo.vw_completedOrders cmp
where lob_lineOfBusiness = 'AAA'
and cl_rundate >= @.startDate
and cl_rundate <= '20060406'
and not exists
( select pd.cl_company
from dbo.vw_paidOrders pd
where pd.cl_company = cmp.cl_company
and pd.cl_rundate > cmp.cl_rundate
and datediff (day, cmp.cl_rundate, pd.cl_rundate) <= 180
and pd.lob_lineOfBusiness = 'AA'
)

-- cl_company cl_recno lob_LineOfBusiness cl_rundate
-- -- -
-- Company 01 11 aaa 20060401
-- Company 05 53 aaa 20060401
-- Company 07 71 aaa 20060102
-- Company 08 81 aaa 20060101
-- Company 09 91 aaa 20060102
-- Company 10 101 aaa 20060102
-- Company 10 102 aaa 20060105


-- --
-- Display the summary
-- --
select Date = cast (cmp.cl_rundate as datetime (102)),
count(cmp.cl_recno) as 'Completed Initials',
cmp.cl_status as Status
from dbo.vw_completedOrders cmp
where lob_lineOfBusiness = 'AAA'
and cl_rundate >= @.startDate
and cl_rundate <= '20060406'
and not exists
( select pd.cl_company
from dbo.vw_paidOrders pd
where pd.cl_company = cmp.cl_company
and pd.cl_rundate > cmp.cl_rundate
and datediff (day, cmp.cl_rundate, pd.cl_rundate) <= 180
and pd.lob_lineOfBusiness = 'AA'
)
group by cmp.cl_status, cmp.cl_rundate

-- Date Completed Initials Status
--
-- 2006-01-01 00:00:00.000 1 C
-- 2006-01-02 00:00:00.000 3 C
-- 2006-01-05 00:00:00.000 1 C
-- 2006-04-01 00:00:00.000 2 C

Let me know if this looks close.

Dave

Optimizing Query to Run

I'm trying to get a query to run which looks at completed orders that have not had another paid order in 180 days. The database I'm running it against is very large so I can't get it to complete. Where's what I've got:

select Date =cast(cl1.cl_rundate as datetime(102)),count(cl1.cl_recno) as 'Completed Initials', cl1.cl_status as Status from dbo.vw_Completedorders cl1 where cl1.lob_lineofbusiness = 'aaa'
and cl1.cl_rundate > '20060801' and not exists (
select cl2.cl_company from dbo.vw_Paidorders cl2 where
cl2.lob_lineofbusiness = 'aa'and cl2.cl_company = cl1.cl_order and cl2.cl_rundate > '20060101' and datediff(day,cl2.cl_rundate,cl1.cl_rundate) < 180)
group by cl1.cl_status, cl1.cl_rundate

Aragon:

I started to do a mock-up of this; however, I quickly decided that without additional information this was not a good idea. Since this database is "very large" and you "can't get it complete" I think it is a good idea to try to get more information. What I see as critical to this query is:

Avoiding table scans -- especially on the view that is included in the "not exists" condition|||Unfortunately in both Views, there are no indexes available. I do have a table I could use, Orders. In the Orders table I have 5 indexes available. The only one pertainable to my search is the recNo field. Unfortunately RunDate is not an index or it may actually work.

Both views references 2 tables, orders and lineofbussines. LineOfBussiness has 3 columns and no indexes.

The main table to reference, orders, has approximately 29k rows per day for each day since late 2004. CompletedOrders has about 11k per day for the same period and PaidOrders has about 10k per day.

Does this give you enough background?|||

Try running the Database Tuning Advisor in SQL Server 2005 or Index Tuning Wizard in SQL Server 2000. You need to use a workload or script that contains this query and other common queries/DML statements. DTA/ITW can then recommend additional indexes that will help. It is hard to tell by just looking at your query where the problem might be. Both the objects are views so we don't know what tables are being referenced or how the views are defined. In addition to this, the indexes on the base tables referenced by the views also matter.

Btw, you are using invalid cast specification for datetime "cast(cl1.cl_rundate as datetime(102))". The code will fail if you upgrade your server to SQL Server 2005. The correct way is to do "cast(cl1.cl_rundate as datetime)". And it is also not clear why you need the cast assuming that the column is smalldatetime/datetime.

|||

Aragon:

This is probably enough for a mock-up, but I have no more time today. The big question is that if you can prove that you need the indexes on the tables or the views will you be allowed to add them?

Dave

|||Thanks for help. Unfortunately, what I really need for indexing is the date field. We tried to index on date before and the database had issues. The rebuild of indexes took too long to load. We have to wait for mainframe jobs to run prior to rebuilding so indexing ran into the work day and caused sych issues with other db's.

the date field is ANSI, so that's why I convert it to a datetime.|||

Got it. Another piece of optimization that will help is to partition the data; sorry for neglecting to mention this in the previous pass. I will see if I can mock this up this morning.

Dave

|||I got it to run using recno to limit the results but it appears to be giving faulty data. I'm having counts of 500 on days when there were only 100 total.|||

Aragon:

I finally got some time today to look at this again. I was able to load a bunch of mock-up data and look at the results. Below is the schema that I used for my mock-up along with some questions. I targeted my mock-up for about 10,000,000 rows which is 29K records per day times 365 days.

Dave


-- -
-- Mock-up schema?
--
--
-- QUESTIONS:
--
--
-- 1. I am not sure about the "CL_RUNDATE" field. I have modeled this field as
-- CHAR (10) storing date as '2006.05.21'. Exactly what kind of data element
-- is the "CL_RUNDATE" field?
-- 2. I am not sure how the views join the Orders table and the LineOfBusiness table.
-- How are these two table joined.
-- 3. Your query filters based on the field "LOB_LINEOFBUSINESS"; does that field of
-- the view come from the ORDERS table or the LINEOFBUSINESS table?
--
-- 4. In your overview you state that you "look at completed orders that have not had
-- another paid order in 180 days." However, in your query you alias view
-- VW_paidOrders with cl2 and look for orders > '20060101' while aliasing view
-- VW_completedOrders with cl1 and looking for orders > '20060801'. This seems
-- backwards.
-- 5. When you take the date difference DATEDIFF (day, cl2.cl_runDate, cl1.cl_runDate)
-- and this also seems backwards.
--
-- 6. When you join the two views you join the PaidOrders "cl_company" column to the
-- completedOrders "cl_order column. Are these two columns simply two different
-- instances of a representation of a company?
--
-- 7. Another question that I saw has to do with this issue:
--
-- Company A Order 1 Paid 1/15/2006
-- Company A Order 2 Completed 2/2/2006
-- Company A Order 3 Completed 9/9/2006
--
-- Assuming that 4/2 is 180 days ago we have Order 2 that precedes 4/2 and
-- has no subsequent PAID order. Should this cause to the 2/2 record to be
-- included in the report or should that fact that there was a 9/9 COMPLETED
-- order after the 2/2 order suppress the 2/2 order from being included?
--
-- --
-- Comments:
-- --
--
-- At this point I have put together a "working mockup"; however, I still need these
-- issues above discussed. I tried out a mock-up of 10,000,000 rows. This requires
-- 10 GB of disk space to store and about 10 GB of transient log space to run the
-- process to generate the data.
--
-- The index that I created is intended to access target data without the use of any
-- bookmark lookups. When I benchmarked the first set of tests this index reduced
-- logical read from 200000 to 296 -- a very large I/O reduction; this is a good
-- starting point but still doesn't yet prove anything. I am for the moment somewhat
-- optimistic.
--
-- None of this addresses partitioning of data.
--
-- -
--
-- Tables:
--
-- MockOrder
-- LineOfBusiness
--
-- Views:
--
-- vw_CompletedOrders
-- vw_PaidOrders
-- -
create table dbo.mockOrder
( cl_recno integer identity
constraint pk_mockOrder primary key,

lob_code char (4) not null,
cl_status char (2) not null,
cl_rundate char(10) not null,
cl_company char (30) not null,
filler char (730) not null
)
go

create index lobCompany_ndx
on mockOrder (cl_company, lob_code, cl_status, cl_rundate)
go


create table dbo.LineOfBusiness
( lob_code char (4) not null
constraint pk_lineOfBusiness primary key,

lob_lineOfBusiness char (4) not null
)
go


create view dbo.vw_Completedorders
as
select cl_recno,
a.lob_code,
b.lob_lineOfBusiness,
a.cl_status,
cl_rundate,
cl_company as cl_order
from dbo.mockOrder a
inner join dbo.lineOfBusiness b
on a.cl_status = 'C'
and a.lob_code = b.lob_code
go


create view dbo.vw_paidOrders
as
select cl_recno,
a.lob_code,
b.lob_lineOfBusiness,
a.cl_status,
cl_rundate,
cl_company
from dbo.mockOrder a
inner join dbo.lineOfBusiness b
on a.cl_status = 'P'
and a.lob_code = b.lob_code

|||

Sorry, I missed an index:

create index lobStatus_ndx
on mockOrder (lob_code, cl_status, cl_rundate, cl_company)

|||Wow, that's a lot of work you put into it. Thanks. Answers below.

1. I am not sure about the "CL_RUNDATE" field. I have modeled this field as
-- CHAR (10) storing date as '2006.05.21'. Exactly what kind of data element is the "CL_RUNDATE" field?

A: Ansi date format. Ex, 20060930

-- 2. I am not sure how the views join the Orders table and the LineOfBusiness table.
-- How are these two table joined.

A: There is a sort code field in both tables. LineOfBusiness is a grouping of sort codes.

-- 3. Your query filters based on the field "LOB_LINEOFBUSINESS"; does that field of
-- the view come from the ORDERS table or the LINEOFBUSINESS table?
--

A: It originates in the LOB table, but is added to the records in both Views.

-- 4. In your overview you state that you "look at completed orders that have not had
-- another paid order in 180 days." However, in your query you alias view
-- VW_paidOrders with cl2 and look for orders > '20060101' while aliasing view
-- VW_completedOrders with cl1 and looking for orders > '20060801'. This seems
-- backwards.

A: I think I mispoke. I'm looking for NEW orders completed when there hasn't been an OLD order the last 180 days. I did forget, in the above example to include a cl2.cl_rundate >= cl1.cl_rundate clause.

-- 5. When you take the date difference DATEDIFF (day, cl2.cl_runDate, cl1.cl_runDate)
-- and this also seems backwards.
--

A: See above. But it shouldn't matter, should it?

-- 6. When you join the two views you join the PaidOrders "cl_company" column to the
-- completedOrders "cl_order column. Are these two columns simply two different
-- instances of a representation of a company?
--

A: I mistyped. Both are company

-- 7. Another question that I saw has to do with this issue:
--
-- Company A Order 1 Paid 1/15/2006
-- Company A Order 2 Completed 2/2/2006
-- Company A Order 3 Completed 9/9/2006
--
-- Assuming that 4/2 is 180 days ago we have Order 2 that precedes 4/2 and
-- has no subsequent PAID order. Should this cause to the 2/2 record to be
-- included in the report or should that fact that there was a 9/9 COMPLETED
-- order after the 2/2 order suppress the 2/2 order from being included?

A: In the above example I would hope to select orders 1 and 3.
|||

Aragon:

Thanks for the answers. Here are the test cases that I set up. Please look these over and let me which of these test cases is false:

set nocount on
truncate table mockOrder

select convert (varchar (10), dateadd (day, -180, getdate()), 112)
-- Results: 20060406

-- --
-- CASE 1:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 01' 0011 'AAA' 'C' 20060401
--
-- This row is returned because:
-- (1) LOB = 'AAA'
-- (2) Status = 'C'
-- (3) Run Date (20060401) < 20060406
-- (4) There is no subsequent PAID record
-- --
insert into mockOrder values ( 11, 'AAA', 'C', '20060401', 'Company 01', ' ')

-- --
-- CASE 2:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 02' 0021 'AAA' 'C' 20060501
--
-- No rows returned because rundate > 20060406
-- --
insert into mockOrder values ( 21, 'AAA', 'C', '20060501', 'Company 02', ' ')


-- --
-- CASE 3:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 03' 0031 'BBB' 'C' 20060401
--
-- No rows returned because LOB is 'BBB'
-- --
insert into mockOrder values ( 31, 'BBB', 'C', '20060401', 'Company 03', ' ')


-- --
-- CASE 4:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 04' 0041 'AAA' 'C' 20060402
-- 'COMPANY 04' 0042 'AA' 'P' 20060430
--
-- No rows returned; although there is a qualifying 'C' record, there is
-- a subsequent 'P' record that disqualifies the qualifying 'C' record
-- --
insert into mockOrder values ( 41, 'AAA', 'C', '20060402', 'Company 04', ' ')
insert into mockOrder values ( 42, 'AA', 'P', '20060430', 'Company 04', ' ')


-- --
-- CASE 5:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 05' 0051 'AAA' 'C' 20060102
-- 'COMPANY 05' 0052 'AA' 'P' 20060131
-- 'COMPANY 05' 0053 'AAA' 'C' 20060401
--
-- The row for Rec No 53 is returned because:
-- (1) LOB = 'AAA'
-- (2) Status = 'C'
-- (3) Run Date (20060401) < 20060406
-- (4) There is no subsequent PAID record
-- --
insert into mockOrder values ( 51, 'AAA', 'C', '20060102', 'Company 05', ' ')
insert into mockOrder values ( 52, 'AA', 'P', '20060131', 'Company 05', ' ')
insert into mockOrder values ( 53, 'AAA', 'C', '20060401', 'Company 05', ' ')


-- --
-- CASE 6:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 06' 0061 'AAA' 'C' 20060102
-- 'COMPANY 06' 0062 'AA' 'P' 20060131
-- 'COMPANY 06' 0063 'AAA' 'C' 20060501
--
-- No Rows returned; (1) Rec No 61 would otherwise qualify, but Rec NO 62
-- is a subsequent 'P' record and therefore Rec No 61 does not qualify.
-- Rec No 63 does not qualify because Run Date (20060501) > 20060406
-- --
insert into mockOrder values ( 61, 'AAA', 'C', '20060102', 'Company 06', ' ')
insert into mockOrder values ( 62, 'AA', 'P', '20060131', 'Company 06', ' ')
insert into mockOrder values ( 63, 'AAA', 'C', '20060501', 'Company 06', ' ')


-- --
-- CASE 7:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 07' 0071 'AAA' 'C' 20060102
-- 'COMPANY 07' 0072 'AAA' 'C' 20060531
--
-- Rec No 71 is returned because it has no matching 'P' record
-- --
insert into mockOrder values ( 71, 'AAA', 'C', '20060102', 'Company 07', ' ')
insert into mockOrder values ( 73, 'AAA', 'C', '20060531', 'Company 07', ' ')


-- --
-- CASE 8:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 08' 0081 'AAA' 'C' 20060101
-- 'COMPANY 08' 0062 'AA' 'P' 20060901
--
-- Row included; the qualifying 'C' record does not have a matching 'P'
-- record for 180 days.
-- --
insert into mockOrder values ( 81, 'AAA', 'C', '20060101', 'Company 08', ' ')
insert into mockOrder values ( 82, 'AA', 'P', '20060901', 'Company 08', ' ')


-- --
-- CASE 9:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 09' 0091 'AAA' 'C' 20060101
-- 'COMPANY 09' 0091 'AAA' 'C' 20060515
-- 'COMPANY 08' 0093 'AA' 'P' 20060901
--
-- Row included; Rec No 91 qualifies and does not get a 'P' record until
-- more than 180 days later
-- --
insert into mockOrder values ( 91, 'AAA', 'C', '20060102', 'Company 09', ' ')
insert into mockOrder values ( 92, 'AAA', 'C', '20060515', 'Company 09', ' ')
insert into mockOrder values ( 93, 'AA', 'P', '20060901', 'Company 09', ' ')


-- --
-- CASE 10:
--
-- -- -- - --
-- Company Rec No LOB Stat Run Date
-- -- -- - --
--
-- 'COMPANY 10' 0101 'AAA' 'C' 20060101
-- 'COMPANY 10' 0102 'AAA' 'C' 20060105
-- 'COMPANY 10' 0103 'AAA' 'C' 20060515
-- 'COMPANY 10' 0104 'AA' 'P' 20060901
--
-- 2 Row included; Rec No 91 qualifies and does not get a 'P' record until
-- more than 180 days later
-- --
insert into mockOrder values (101, 'AAA', 'C', '20060102', 'Company 10', ' ')
insert into mockOrder values (102, 'AAA', 'C', '20060105', 'Company 10', ' ')
insert into mockOrder values (103, 'AAA', 'C', '20060515', 'Company 10', ' ')
insert into mockOrder values (104, 'AA', 'P', '20060901', 'Company 10', ' ')


-- --

select cl_company,
cl_recno,
lob_sort_code,
cl_status,
cl_rundate
from mockOrder
order by cl_recno


--

-- cl_company cl_recno lob_sort_code cl_status cl_rundate
-- -- - -
-- Company 01 11 AAA C 20060401
-- Company 02 21 AAA C 20060501
-- Company 03 31 BBB C 20060401
--
-- Company 04 41 AAA C 20060402
-- Company 04 42 AA P 20060430
--
-- Company 05 51 AAA C 20060102
-- Company 05 52 AA P 20060131
-- Company 05 53 AAA C 20060401
--
-- Company 06 61 AAA C 20060102
-- Company 06 62 AA P 20060131
-- Company 06 63 AAA C 20060501
--
-- Company 07 71 AAA C 20060102
-- Company 07 73 AAA C 20060531
--
-- Company 08 81 AAA C 20060101
-- Company 08 82 AA P 20060901
--
-- Company 09 91 AAA C 20060102
-- Company 09 92 AAA C 20060515
-- Company 09 93 AA P 20060901
--
-- Company 10 101 AAA C 20060102
-- Company 10 102 AAA C 20060105
-- Company 10 103 AAA C 20060515
-- Company 10 104 AA P 20060901
--
--

After inserting the test data described above, I ran the following with the indicated results:


declare @.compDate char (8)
declare @.startDate char (8)
declare @.paidDate char (8)
set @.compDate = convert (varchar (8), dateadd (day, -180, getdate()), 112)
set @.startDate = '20060101'
set @.paidDate = convert (varchar (8), dateadd (day, 180, @.startdate), 112)

-- --
-- Display dates that are important to the selection process
-- --
select @.compDate as [@.compDate],
@.startDate as [@.startDate],
@.paidDate as [paidDate]

-- @.compDate @.startDate paidDate
-- - --
-- 20060406 20060101 20060630


-- --
-- Display the records that make it through the filtering process
-- --
select cl_company,
cl_recno,
lob_LineOfBusiness,
cl_rundate
from dbo.vw_completedOrders cmp
where lob_lineOfBusiness = 'AAA'
and cl_rundate >= @.startDate
and cl_rundate <= '20060406'
and not exists
( select pd.cl_company
from dbo.vw_paidOrders pd
where pd.cl_company = cmp.cl_company
and pd.cl_rundate > cmp.cl_rundate
and datediff (day, cmp.cl_rundate, pd.cl_rundate) <= 180
and pd.lob_lineOfBusiness = 'AA'
)

-- cl_company cl_recno lob_LineOfBusiness cl_rundate
-- -- -
-- Company 01 11 aaa 20060401
-- Company 05 53 aaa 20060401
-- Company 07 71 aaa 20060102
-- Company 08 81 aaa 20060101
-- Company 09 91 aaa 20060102
-- Company 10 101 aaa 20060102
-- Company 10 102 aaa 20060105


-- --
-- Display the summary
-- --
select Date = cast (cmp.cl_rundate as datetime (102)),
count(cmp.cl_recno) as 'Completed Initials',
cmp.cl_status as Status
from dbo.vw_completedOrders cmp
where lob_lineOfBusiness = 'AAA'
and cl_rundate >= @.startDate
and cl_rundate <= '20060406'
and not exists
( select pd.cl_company
from dbo.vw_paidOrders pd
where pd.cl_company = cmp.cl_company
and pd.cl_rundate > cmp.cl_rundate
and datediff (day, cmp.cl_rundate, pd.cl_rundate) <= 180
and pd.lob_lineOfBusiness = 'AA'
)
group by cmp.cl_status, cmp.cl_rundate

-- Date Completed Initials Status
--
-- 2006-01-01 00:00:00.000 1 C
-- 2006-01-02 00:00:00.000 3 C
-- 2006-01-05 00:00:00.000 1 C
-- 2006-04-01 00:00:00.000 2 C

Let me know if this looks close.

Dave

Optimizing query execution...

We have SQL Server 2000 and int is an Oracle linked server. I'm trying to run the following query...

SELECT DISTINCT a.auf_nr AS OrderNo,
e.ku_name AS Customer,
d.bestell_dat AS OrdDate,
d.liefer_dat AS DelvDate,
CAST(SUM(b.anz) AS FLOAT) Qty,
CAST(SUM((CAST(c.breite AS FLOAT) / 1000 * CAST(c.hoehe AS FLOAT) / 1000) * b.anz) AS FLOAT) SQM,
CAST(SUM(a.liefer_offen) - (SUM(a.anz) - SUM(b.anz)) AS FLOAT) AvailDelv,
CAST(SUM(a.liefer_anz) AS FLOAT) Delvd,
CAST(SUM(c.sum_brutto*a.anz) AS FLOAT) Value

FROM liorder..LIORDER.AUF_STAT a,
liorder..LIORDER.AUF_LIP_STATUS b,
liorder..LIORDER.AUF_POS c,
liorder..LIORDER.AUF_KOPF d,
liorder..LIORDER.KUST_ADR e

WHERE a.auf_nr = b.auf_nr and
b.auf_nr = c.auf_nr and
c.auf_nr = d.auf_nr and
d.kunr = e.ku_nr and
a.auf_pos = b.auf_pos and
b.auf_pos = c.auf_pos and
b.lip_status = 7 and
c.ver_art !='V' and
a.history = 0 and
a.rg_stat != 2 and
e.ku_name IS not null and
e.ku_vk_ek = 0 and
d.bestell_dat BETWEEN '01/01/2005' and '12/17/2005'

GROUP BY a.auf_nr,
d.liefer_dat,
b.lip_status,
d.bestell_dat,
e.ku_name,
d.kopf_tour,
d.kopf_firma

HAVING CAST(SUM(a.liefer_offen)-(SUM(a.anz)-SUM(b.anz)) AS FLOAT) > 0

..and it takes around 2 minutes to show the results even if the date range is of the same date. I even tried to use an indexed column but I still get the same slow execution time. I even tried to create a UDF so that the WHERE clause would be resolved remotely on the Oracle DB but still the same. Is there anyway I can do it in much more efficient and faster way?I'd use OPENQUERY (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_oa-oz_5xix.asp).

-PatP

Wednesday, March 28, 2012

Optimizing Job fails on tables with computed fields

Hi,
we have some tables in SQL Server 2000 with computed fields. When we try to
run an optimization job, it return the error
"Rebuilding indexes for table 'FI_Period'
[Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 1934: [Microsoft][ODBC SQL
Server Driver][SQL Server]DBCC failed because the following SET options have
incorrect settings: 'QUOTED_IDENTIFIER'"
After removing the field the job runs without errors. I cannot see any link
between a computed field and the Quoted Identifier setting.
Any input is appreciated.
TIA, Norbert MeissSee MSKB 301292 <http://support.microsoft.com/kb/q301292/>.
Hope this helps.
Dan Guzman
SQL Server MVP
"Norbert Meiss" <NorbertMeiss@.discussions.microsoft.com> wrote in message
news:436B7DE5-21CC-413B-A0D6-854FD3721894@.microsoft.com...
> Hi,
> we have some tables in SQL Server 2000 with computed fields. When we try
> to
> run an optimization job, it return the error
> "Rebuilding indexes for table 'FI_Period'
> [Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 1934: [Microsoft][ODBC
> SQL
> Server Driver][SQL Server]DBCC failed because the following SET options
> have
> incorrect settings: 'QUOTED_IDENTIFIER'"
> After removing the field the job runs without errors. I cannot see any
> link
> between a computed field and the Quoted Identifier setting.
> Any input is appreciated.
> TIA, Norbert Meiss

Optimizing datediff

Hey all,

I run a monthly delete that's recurring; delete data > 2 months old. For December, I thought up two ways to do this:

Query 1:
delete from daily_statements where daily_statements < '2006-10-01'

Query 2:
delete from daily_statements where datediff(month, statement_date, getdate()) <= 3

I've plugged in Query 2 into the job step because it can be run at any time without having to change the value, but it took 22 minutes to delete 2.5 million rows versus 4 minutes to delete 5.3 million rows for the other one.

Is query 2 the best it can be? Thanks!

If the statement_date column is indexed, the 2nd query search argument is not efficient since the use of the DATEDIFF function on the column will eliminate the use of the index. The index contains the actual date values so the query optimizer needs search arguments that doesn't change the column in any way. You can do something like below instead:

delete from daily_statements

where statement_date < dateadd(month, -2, convert(varchar, dateadd(day, 1-day(current_timestamp), current_timestamp), 112))

The expression using CURRENT_TIMESTAMP gives the start of the current month and then goes past 2 months. You can then use it to generate the value needed for the statement_date column comparison. In general, expression involving the column directly is more efficient than expressions that call functions or user-defined functions on columns.

|||Use a variable to compare the dates:

DECLARE @.enddate datetime

SET @.enddate = DATEADD(month, -2, GETDATE())
SET @.enddate = CAST(CAST(MONTH(@.enddate) AS VARCHAR(2)) + '/01/' + CAST(YEAR(@.enddate) AS VARCHAR(4)) AS DATETIME)

delete from daily_statements where daily_statements < @.enddate

|||Please don't use the local variable approach. It does not allow the optimizer to do parameter sniffing for example. And since the batch is optimized as a whole an estimate will be used for the variable and it will result in sub-optimal plan. Using the CURRENT_TIMESTAMP expression inline provides the best performance since the query optimizer has built-in rules for such expressions. Also, using a string format for datetime/smalldatetime that is dependent on regional settings or language of the session can result in run-time errors.|||It should treat the local var like a "static" like if I say startdate < '10/1/2006'.

You are correct about the date format. I used the US date, because that was easy.

Monday, March 26, 2012

optimizer shows high record count

We recently have found a query that started to run long (30 seconds as
opposed to <1). No significant change in data, statistics updated,
etc. SQLServer 2000 sp3
When looking at the rowcount on one of the execution plan tasks it
shows >9 million rows, the table only has 14K rows in it.
We've done dbcc on the table/indexes with no results/corruption. In
our development environment, this works as expected, that is the row
count matches the estimated row count.
Any thoughts?
Thanks in advance.
Jason
Have you updated the statistics or reindexed that table lately?
Andrew J. Kelly SQL MVP
"fc_celtic_fan" <jasonb@.brandes.com> wrote in message
news:1107996642.783750.242070@.g14g2000cwa.googlegr oups.com...
> We recently have found a query that started to run long (30 seconds as
> opposed to <1). No significant change in data, statistics updated,
> etc. SQLServer 2000 sp3
> When looking at the rowcount on one of the execution plan tasks it
> shows >9 million rows, the table only has 14K rows in it.
> We've done dbcc on the table/indexes with no results/corruption. In
> our development environment, this works as expected, that is the row
> count matches the estimated row count.
> Any thoughts?
> Thanks in advance.
> Jason
>
|||Yes we have been in the boat as well and we used to run DBCC
updateusage('dbname','tablename')
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:O1xTABxDFHA.2676@.TK2MSFTNGP12.phx.gbl...
> Have you updated the statistics or reindexed that table lately?
> --
> Andrew J. Kelly SQL MVP
>
> "fc_celtic_fan" <jasonb@.brandes.com> wrote in message
> news:1107996642.783750.242070@.g14g2000cwa.googlegr oups.com...
>
|||What does sysindexes say? Also, did you do UPDATEUSAGE with ROW_COUNT?
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"Hassan" <fatima_ja@.hotmail.com> wrote in message news:Owi8xM0DFHA.3596@.TK2MSFTNGP10.phx.gbl...
> Yes we have been in the boat as well and we used to run DBCC
> updateusage('dbname','tablename')
> "Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
> news:O1xTABxDFHA.2676@.TK2MSFTNGP12.phx.gbl...
>

optimizer shows high record count

We recently have found a query that started to run long (30 seconds as
opposed to <1). No significant change in data, statistics updated,
etc. SQLServer 2000 sp3
When looking at the rowcount on one of the execution plan tasks it
shows >9 million rows, the table only has 14K rows in it.
We've done dbcc on the table/indexes with no results/corruption. In
our development environment, this works as expected, that is the row
count matches the estimated row count.
Any thoughts?
Thanks in advance.
JasonHave you updated the statistics or reindexed that table lately?
Andrew J. Kelly SQL MVP
"fc_celtic_fan" <jasonb@.brandes.com> wrote in message
news:1107996642.783750.242070@.g14g2000cwa.googlegroups.com...
> We recently have found a query that started to run long (30 seconds as
> opposed to <1). No significant change in data, statistics updated,
> etc. SQLServer 2000 sp3
> When looking at the rowcount on one of the execution plan tasks it
> shows >9 million rows, the table only has 14K rows in it.
> We've done dbcc on the table/indexes with no results/corruption. In
> our development environment, this works as expected, that is the row
> count matches the estimated row count.
> Any thoughts?
> Thanks in advance.
> Jason
>|||Yes we have been in the boat as well and we used to run DBCC
updateusage('dbname','tablename')
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:O1xTABxDFHA.2676@.TK2MSFTNGP12.phx.gbl...
> Have you updated the statistics or reindexed that table lately?
> --
> Andrew J. Kelly SQL MVP
>
> "fc_celtic_fan" <jasonb@.brandes.com> wrote in message
> news:1107996642.783750.242070@.g14g2000cwa.googlegroups.com...
>|||What does sysindexes say? Also, did you do UPDATEUSAGE with ROW_COUNT?
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"Hassan" <fatima_ja@.hotmail.com> wrote in message news:Owi8xM0DFHA.3596@.TK2MSFTNGP10.phx.gbl
..
> Yes we have been in the boat as well and we used to run DBCC
> updateusage('dbname','tablename')
> "Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
> news:O1xTABxDFHA.2676@.TK2MSFTNGP12.phx.gbl...
>sql

optimizer shows high record count

We recently have found a query that started to run long (30 seconds as
opposed to <1). No significant change in data, statistics updated,
etc. SQLServer 2000 sp3
When looking at the rowcount on one of the execution plan tasks it
shows >9 million rows, the table only has 14K rows in it.
We've done dbcc on the table/indexes with no results/corruption. In
our development environment, this works as expected, that is the row
count matches the estimated row count.
Any thoughts?
Thanks in advance.
JasonHave you updated the statistics or reindexed that table lately?
--
Andrew J. Kelly SQL MVP
"fc_celtic_fan" <jasonb@.brandes.com> wrote in message
news:1107996642.783750.242070@.g14g2000cwa.googlegroups.com...
> We recently have found a query that started to run long (30 seconds as
> opposed to <1). No significant change in data, statistics updated,
> etc. SQLServer 2000 sp3
> When looking at the rowcount on one of the execution plan tasks it
> shows >9 million rows, the table only has 14K rows in it.
> We've done dbcc on the table/indexes with no results/corruption. In
> our development environment, this works as expected, that is the row
> count matches the estimated row count.
> Any thoughts?
> Thanks in advance.
> Jason
>|||Yes we have been in the boat as well and we used to run DBCC
updateusage('dbname','tablename')
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:O1xTABxDFHA.2676@.TK2MSFTNGP12.phx.gbl...
> Have you updated the statistics or reindexed that table lately?
> --
> Andrew J. Kelly SQL MVP
>
> "fc_celtic_fan" <jasonb@.brandes.com> wrote in message
> news:1107996642.783750.242070@.g14g2000cwa.googlegroups.com...
> >
> > We recently have found a query that started to run long (30 seconds as
> > opposed to <1). No significant change in data, statistics updated,
> > etc. SQLServer 2000 sp3
> >
> > When looking at the rowcount on one of the execution plan tasks it
> > shows >9 million rows, the table only has 14K rows in it.
> >
> > We've done dbcc on the table/indexes with no results/corruption. In
> > our development environment, this works as expected, that is the row
> > count matches the estimated row count.
> >
> > Any thoughts?
> >
> > Thanks in advance.
> >
> > Jason
> >
>|||What does sysindexes say? Also, did you do UPDATEUSAGE with ROW_COUNT?
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"Hassan" <fatima_ja@.hotmail.com> wrote in message news:Owi8xM0DFHA.3596@.TK2MSFTNGP10.phx.gbl...
> Yes we have been in the boat as well and we used to run DBCC
> updateusage('dbname','tablename')
> "Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
> news:O1xTABxDFHA.2676@.TK2MSFTNGP12.phx.gbl...
>> Have you updated the statistics or reindexed that table lately?
>> --
>> Andrew J. Kelly SQL MVP
>>
>> "fc_celtic_fan" <jasonb@.brandes.com> wrote in message
>> news:1107996642.783750.242070@.g14g2000cwa.googlegroups.com...
>> >
>> > We recently have found a query that started to run long (30 seconds as
>> > opposed to <1). No significant change in data, statistics updated,
>> > etc. SQLServer 2000 sp3
>> >
>> > When looking at the rowcount on one of the execution plan tasks it
>> > shows >9 million rows, the table only has 14K rows in it.
>> >
>> > We've done dbcc on the table/indexes with no results/corruption. In
>> > our development environment, this works as expected, that is the row
>> > count matches the estimated row count.
>> >
>> > Any thoughts?
>> >
>> > Thanks in advance.
>> >
>> > Jason
>> >
>>
>

Friday, March 23, 2012

Optimizations Plan Slow

I am looking for some general best practices regarding how often to run an
optimations plan/job, and what options I should choose. Specifically, I am
have very inconsistent results on a nightly job I have set to run
optimizations on my database, ranging anywhere from 5 minutes to 100+
minutes. The daily changes to the database itself do not vary *that*
drastically (it's basically the same activity each day), so I'm trying to
figure out 1) why it takes so long in the first place, and 2) why the
duration varies so much from day to day.
The job is set to run nightly, around 4AM. I have "Reorganize data and
index pages" checked, w/ the "Change free space per page percentage to"
option set to 10%. I also have "Remove unused space from database files"
checked, w/ the "Shrink database when it grows beyond" option set to 50MB.
Currently, the size of the entire database is between 1GB and 2GB, though
this used to be over 4GB until I dropped/readded an index on a table that
has its records archived frequently -- this alone freed up almost 3GB.
The reason I need to minimize the time it takes for this job, is because
while the job is running, my web site is pretty much hosed. At one time,
4AM was off-peak hours, so this was not as much an issue. But now, the site
gets quite a bit of activity from another part of the globe, so 4AM is no
longer "off-peak" -- there really isn't an off-peak now. So this job is
causing problems for the non-regional visitors during this time.
I'm not sure if this is enough info for you to provide valuable feedback,
but if there's anything else I could provide to help you help me, let me
know.
Thanks in advance.
JeradPartial answer:
You want to avoid fragmenting SQL Server database files on disk if at all
possible. Database files that expand, especially in relatively small
increments, will become fragmented. The varying degree of fragmentation MAY
contribute to your varying run times.
I would suggest expanding your database to a reasonable size, (preferably to
it's expected stable size, or at least allowing 12 month's growth). Then
remove the SHRINK database options.
For a database this size do you really need to re-establish the index
fill-factor on a daily basis? That is, do users' experience performance
degradation within hours? If not, then you may be advised to schedule this
reindexing work less frequently, perhaps at weekends which are often
quieter. At the end of the working day run DBCC SHOWCONTIG - if the Scan
Density will fall below, say, 40% byb the nd of week, then index work may be
needed. Even then, consider DBCC INDEXDEFRAG which is an online operation-
less intrusive and can be interrupted. (See BOL for further info.)
Good luck.
Mike P.
"Jerad Rose" <no@.spam.com> wrote in message
news:OwZe2xGuFHA.4080@.TK2MSFTNGP12.phx.gbl...
> I am looking for some general best practices regarding how often to run an
> optimations plan/job, and what options I should choose. Specifically, I
am
> have very inconsistent results on a nightly job I have set to run
> optimizations on my database, ranging anywhere from 5 minutes to 100+
> minutes. The daily changes to the database itself do not vary *that*
> drastically (it's basically the same activity each day), so I'm trying to
> figure out 1) why it takes so long in the first place, and 2) why the
> duration varies so much from day to day.
> The job is set to run nightly, around 4AM. I have "Reorganize data and
> index pages" checked, w/ the "Change free space per page percentage to"
> option set to 10%. I also have "Remove unused space from database files"
> checked, w/ the "Shrink database when it grows beyond" option set to 50MB.
> Currently, the size of the entire database is between 1GB and 2GB, though
> this used to be over 4GB until I dropped/readded an index on a table that
> has its records archived frequently -- this alone freed up almost 3GB.
> The reason I need to minimize the time it takes for this job, is because
> while the job is running, my web site is pretty much hosed. At one time,
> 4AM was off-peak hours, so this was not as much an issue. But now, the
site
> gets quite a bit of activity from another part of the globe, so 4AM is no
> longer "off-peak" -- there really isn't an off-peak now. So this job is
> causing problems for the non-regional visitors during this time.
> I'm not sure if this is enough info for you to provide valuable feedback,
> but if there's anything else I could provide to help you help me, let me
> know.
> Thanks in advance.
> Jerad
>|||Hi Mike.
Thanks for the response. What you said made a lot of sense, and got me to
doing a little more research on index fragments. I found a really good
article that helped me understand why index fragments occur, and what can be
done to prevent/minimze them:
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx
I have since disabled this job, and have set up a view (report) that I now
monitor to check for excessive index fragments, and then use DBCC
INDEXDEFRAG to reduce the fragments on those tables. The article discusses
an article on BOL that provides a script you can use to automatically defrag
any indexes that has fragments exceeding a given threshold. But for now, I
want to monitor my tables to see which ones are frequently being fragmented,
and maybe redesign the table/indexes to reduce the fragments, instead of
constantly defragging these indexes -- if possible.
Thanks again for your help.
Jerad
"Mike P" <mikep@.aicorporation.com__NO_SPAM> wrote in message
news:uJzcNiIuFHA.2848@.TK2MSFTNGP10.phx.gbl...
> Partial answer:
> You want to avoid fragmenting SQL Server database files on disk if at all
> possible. Database files that expand, especially in relatively small
> increments, will become fragmented. The varying degree of fragmentation
> MAY
> contribute to your varying run times.
> I would suggest expanding your database to a reasonable size, (preferably
> to
> it's expected stable size, or at least allowing 12 month's growth). Then
> remove the SHRINK database options.
> For a database this size do you really need to re-establish the index
> fill-factor on a daily basis? That is, do users' experience performance
> degradation within hours? If not, then you may be advised to schedule
> this
> reindexing work less frequently, perhaps at weekends which are often
> quieter. At the end of the working day run DBCC SHOWCONTIG - if the Scan
> Density will fall below, say, 40% byb the nd of week, then index work may
> be
> needed. Even then, consider DBCC INDEXDEFRAG which is an online
> operation-
> less intrusive and can be interrupted. (See BOL for further info.)
> Good luck.
> Mike P.
> "Jerad Rose" <no@.spam.com> wrote in message
> news:OwZe2xGuFHA.4080@.TK2MSFTNGP12.phx.gbl...
>> I am looking for some general best practices regarding how often to run
>> an
>> optimations plan/job, and what options I should choose. Specifically, I
> am
>> have very inconsistent results on a nightly job I have set to run
>> optimizations on my database, ranging anywhere from 5 minutes to 100+
>> minutes. The daily changes to the database itself do not vary *that*
>> drastically (it's basically the same activity each day), so I'm trying to
>> figure out 1) why it takes so long in the first place, and 2) why the
>> duration varies so much from day to day.
>> The job is set to run nightly, around 4AM. I have "Reorganize data and
>> index pages" checked, w/ the "Change free space per page percentage to"
>> option set to 10%. I also have "Remove unused space from database files"
>> checked, w/ the "Shrink database when it grows beyond" option set to
>> 50MB.
>> Currently, the size of the entire database is between 1GB and 2GB, though
>> this used to be over 4GB until I dropped/readded an index on a table that
>> has its records archived frequently -- this alone freed up almost 3GB.
>> The reason I need to minimize the time it takes for this job, is because
>> while the job is running, my web site is pretty much hosed. At one time,
>> 4AM was off-peak hours, so this was not as much an issue. But now, the
> site
>> gets quite a bit of activity from another part of the globe, so 4AM is no
>> longer "off-peak" -- there really isn't an off-peak now. So this job is
>> causing problems for the non-regional visitors during this time.
>> I'm not sure if this is enough info for you to provide valuable feedback,
>> but if there's anything else I could provide to help you help me, let me
>> know.
>> Thanks in advance.
>> Jerad
>>
>sql

Optimizations Plan Slow

I am looking for some general best practices regarding how often to run an
optimations plan/job, and what options I should choose. Specifically, I am
have very inconsistent results on a nightly job I have set to run
optimizations on my database, ranging anywhere from 5 minutes to 100+
minutes. The daily changes to the database itself do not vary *that*
drastically (it's basically the same activity each day), so I'm trying to
figure out 1) why it takes so long in the first place, and 2) why the
duration varies so much from day to day.
The job is set to run nightly, around 4AM. I have "Reorganize data and
index pages" checked, w/ the "Change free space per page percentage to"
option set to 10%. I also have "Remove unused space from database files"
checked, w/ the "Shrink database when it grows beyond" option set to 50MB.
Currently, the size of the entire database is between 1GB and 2GB, though
this used to be over 4GB until I dropped/readded an index on a table that
has its records archived frequently -- this alone freed up almost 3GB.
The reason I need to minimize the time it takes for this job, is because
while the job is running, my web site is pretty much hosed. At one time,
4AM was off-peak hours, so this was not as much an issue. But now, the site
gets quite a bit of activity from another part of the globe, so 4AM is no
longer "off-peak" -- there really isn't an off-peak now. So this job is
causing problems for the non-regional visitors during this time.
I'm not sure if this is enough info for you to provide valuable feedback,
but if there's anything else I could provide to help you help me, let me
know.
Thanks in advance.
Jerad
Partial answer:
You want to avoid fragmenting SQL Server database files on disk if at all
possible. Database files that expand, especially in relatively small
increments, will become fragmented. The varying degree of fragmentation MAY
contribute to your varying run times.
I would suggest expanding your database to a reasonable size, (preferably to
it's expected stable size, or at least allowing 12 month's growth). Then
remove the SHRINK database options.
For a database this size do you really need to re-establish the index
fill-factor on a daily basis? That is, do users' experience performance
degradation within hours? If not, then you may be advised to schedule this
reindexing work less frequently, perhaps at weekends which are often
quieter. At the end of the working day run DBCC SHOWCONTIG - if the Scan
Density will fall below, say, 40% byb the nd of week, then index work may be
needed. Even then, consider DBCC INDEXDEFRAG which is an online operation-
less intrusive and can be interrupted. (See BOL for further info.)
Good luck.
Mike P.
"Jerad Rose" <no@.spam.com> wrote in message
news:OwZe2xGuFHA.4080@.TK2MSFTNGP12.phx.gbl...
> I am looking for some general best practices regarding how often to run an
> optimations plan/job, and what options I should choose. Specifically, I
am
> have very inconsistent results on a nightly job I have set to run
> optimizations on my database, ranging anywhere from 5 minutes to 100+
> minutes. The daily changes to the database itself do not vary *that*
> drastically (it's basically the same activity each day), so I'm trying to
> figure out 1) why it takes so long in the first place, and 2) why the
> duration varies so much from day to day.
> The job is set to run nightly, around 4AM. I have "Reorganize data and
> index pages" checked, w/ the "Change free space per page percentage to"
> option set to 10%. I also have "Remove unused space from database files"
> checked, w/ the "Shrink database when it grows beyond" option set to 50MB.
> Currently, the size of the entire database is between 1GB and 2GB, though
> this used to be over 4GB until I dropped/readded an index on a table that
> has its records archived frequently -- this alone freed up almost 3GB.
> The reason I need to minimize the time it takes for this job, is because
> while the job is running, my web site is pretty much hosed. At one time,
> 4AM was off-peak hours, so this was not as much an issue. But now, the
site
> gets quite a bit of activity from another part of the globe, so 4AM is no
> longer "off-peak" -- there really isn't an off-peak now. So this job is
> causing problems for the non-regional visitors during this time.
> I'm not sure if this is enough info for you to provide valuable feedback,
> but if there's anything else I could provide to help you help me, let me
> know.
> Thanks in advance.
> Jerad
>
|||Hi Mike.
Thanks for the response. What you said made a lot of sense, and got me to
doing a little more research on index fragments. I found a really good
article that helped me understand why index fragments occur, and what can be
done to prevent/minimze them:
http://www.microsoft.com/technet/pro.../ss2kidbp.mspx
I have since disabled this job, and have set up a view (report) that I now
monitor to check for excessive index fragments, and then use DBCC
INDEXDEFRAG to reduce the fragments on those tables. The article discusses
an article on BOL that provides a script you can use to automatically defrag
any indexes that has fragments exceeding a given threshold. But for now, I
want to monitor my tables to see which ones are frequently being fragmented,
and maybe redesign the table/indexes to reduce the fragments, instead of
constantly defragging these indexes -- if possible.
Thanks again for your help.
Jerad
"Mike P" <mikep@.aicorporation.com__NO_SPAM> wrote in message
news:uJzcNiIuFHA.2848@.TK2MSFTNGP10.phx.gbl...
> Partial answer:
> You want to avoid fragmenting SQL Server database files on disk if at all
> possible. Database files that expand, especially in relatively small
> increments, will become fragmented. The varying degree of fragmentation
> MAY
> contribute to your varying run times.
> I would suggest expanding your database to a reasonable size, (preferably
> to
> it's expected stable size, or at least allowing 12 month's growth). Then
> remove the SHRINK database options.
> For a database this size do you really need to re-establish the index
> fill-factor on a daily basis? That is, do users' experience performance
> degradation within hours? If not, then you may be advised to schedule
> this
> reindexing work less frequently, perhaps at weekends which are often
> quieter. At the end of the working day run DBCC SHOWCONTIG - if the Scan
> Density will fall below, say, 40% byb the nd of week, then index work may
> be
> needed. Even then, consider DBCC INDEXDEFRAG which is an online
> operation-
> less intrusive and can be interrupted. (See BOL for further info.)
> Good luck.
> Mike P.
> "Jerad Rose" <no@.spam.com> wrote in message
> news:OwZe2xGuFHA.4080@.TK2MSFTNGP12.phx.gbl...
> am
> site
>

Optimizations Plan Slow

I am looking for some general best practices regarding how often to run an
optimations plan/job, and what options I should choose. Specifically, I am
have very inconsistent results on a nightly job I have set to run
optimizations on my database, ranging anywhere from 5 minutes to 100+
minutes. The daily changes to the database itself do not vary *that*
drastically (it's basically the same activity each day), so I'm trying to
figure out 1) why it takes so long in the first place, and 2) why the
duration varies so much from day to day.
The job is set to run nightly, around 4AM. I have "Reorganize data and
index pages" checked, w/ the "Change free space per page percentage to"
option set to 10%. I also have "Remove unused space from database files"
checked, w/ the "Shrink database when it grows beyond" option set to 50MB.
Currently, the size of the entire database is between 1GB and 2GB, though
this used to be over 4GB until I dropped/readded an index on a table that
has its records archived frequently -- this alone freed up almost 3GB.
The reason I need to minimize the time it takes for this job, is because
while the job is running, my web site is pretty much hosed. At one time,
4AM was off-peak hours, so this was not as much an issue. But now, the site
gets quite a bit of activity from another part of the globe, so 4AM is no
longer "off-peak" -- there really isn't an off-peak now. So this job is
causing problems for the non-regional visitors during this time.
I'm not sure if this is enough info for you to provide valuable feedback,
but if there's anything else I could provide to help you help me, let me
know.
Thanks in advance.
JeradPartial answer:
You want to avoid fragmenting SQL Server database files on disk if at all
possible. Database files that expand, especially in relatively small
increments, will become fragmented. The varying degree of fragmentation MAY
contribute to your varying run times.
I would suggest expanding your database to a reasonable size, (preferably to
it's expected stable size, or at least allowing 12 month's growth). Then
remove the SHRINK database options.
For a database this size do you really need to re-establish the index
fill-factor on a daily basis? That is, do users' experience performance
degradation within hours? If not, then you may be advised to schedule this
reindexing work less frequently, perhaps at weekends which are often
quieter. At the end of the working day run DBCC SHOWCONTIG - if the Scan
Density will fall below, say, 40% byb the nd of week, then index work may be
needed. Even then, consider DBCC INDEXDEFRAG which is an online operation-
less intrusive and can be interrupted. (See BOL for further info.)
Good luck.
Mike P.
"Jerad Rose" <no@.spam.com> wrote in message
news:OwZe2xGuFHA.4080@.TK2MSFTNGP12.phx.gbl...
> I am looking for some general best practices regarding how often to run an
> optimations plan/job, and what options I should choose. Specifically, I
am
> have very inconsistent results on a nightly job I have set to run
> optimizations on my database, ranging anywhere from 5 minutes to 100+
> minutes. The daily changes to the database itself do not vary *that*
> drastically (it's basically the same activity each day), so I'm trying to
> figure out 1) why it takes so long in the first place, and 2) why the
> duration varies so much from day to day.
> The job is set to run nightly, around 4AM. I have "Reorganize data and
> index pages" checked, w/ the "Change free space per page percentage to"
> option set to 10%. I also have "Remove unused space from database files"
> checked, w/ the "Shrink database when it grows beyond" option set to 50MB.
> Currently, the size of the entire database is between 1GB and 2GB, though
> this used to be over 4GB until I dropped/readded an index on a table that
> has its records archived frequently -- this alone freed up almost 3GB.
> The reason I need to minimize the time it takes for this job, is because
> while the job is running, my web site is pretty much hosed. At one time,
> 4AM was off-peak hours, so this was not as much an issue. But now, the
site
> gets quite a bit of activity from another part of the globe, so 4AM is no
> longer "off-peak" -- there really isn't an off-peak now. So this job is
> causing problems for the non-regional visitors during this time.
> I'm not sure if this is enough info for you to provide valuable feedback,
> but if there's anything else I could provide to help you help me, let me
> know.
> Thanks in advance.
> Jerad
>|||Hi Mike.
Thanks for the response. What you said made a lot of sense, and got me to
doing a little more research on index fragments. I found a really good
article that helped me understand why index fragments occur, and what can be
done to prevent/minimze them:
http://www.microsoft.com/technet/pr...n/ss2kidbp.mspx
I have since disabled this job, and have set up a view (report) that I now
monitor to check for excessive index fragments, and then use DBCC
INDEXDEFRAG to reduce the fragments on those tables. The article discusses
an article on BOL that provides a script you can use to automatically defrag
any indexes that has fragments exceeding a given threshold. But for now, I
want to monitor my tables to see which ones are frequently being fragmented,
and maybe redesign the table/indexes to reduce the fragments, instead of
constantly defragging these indexes -- if possible.
Thanks again for your help.
Jerad
"Mike P" <mikep@.aicorporation.com__NO_SPAM> wrote in message
news:uJzcNiIuFHA.2848@.TK2MSFTNGP10.phx.gbl...
> Partial answer:
> You want to avoid fragmenting SQL Server database files on disk if at all
> possible. Database files that expand, especially in relatively small
> increments, will become fragmented. The varying degree of fragmentation
> MAY
> contribute to your varying run times.
> I would suggest expanding your database to a reasonable size, (preferably
> to
> it's expected stable size, or at least allowing 12 month's growth). Then
> remove the SHRINK database options.
> For a database this size do you really need to re-establish the index
> fill-factor on a daily basis? That is, do users' experience performance
> degradation within hours? If not, then you may be advised to schedule
> this
> reindexing work less frequently, perhaps at weekends which are often
> quieter. At the end of the working day run DBCC SHOWCONTIG - if the Scan
> Density will fall below, say, 40% byb the nd of week, then index work may
> be
> needed. Even then, consider DBCC INDEXDEFRAG which is an online
> operation-
> less intrusive and can be interrupted. (See BOL for further info.)
> Good luck.
> Mike P.
> "Jerad Rose" <no@.spam.com> wrote in message
> news:OwZe2xGuFHA.4080@.TK2MSFTNGP12.phx.gbl...
> am
> site
>