OpenXML Identity Insert: Optimizing a big query

To start with, I'll give a simplified overview of my data.

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

ProblemTable (4mil rows)
ID | ConcatenatedHistory

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Using left outer joins (all from base, the rest optional) I've got a
view something like:

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

Quote:

Originally Posted by

>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

Quote:

Originally Posted by

>From View

Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

Quote:

Originally Posted by

>From View

Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code

Quote:

Originally Posted by

>From View inner join History on History.ID = View.ID

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.On Feb 8, 4:05 pm, "Merennulli" <mar...@.sdf.lonestar.orgwrote:

Quote:

Originally Posted by

To start with, I'll give a simplified overview of my data.
>
BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc
>
OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data
>
ProblemTable (4mil rows)
ID | ConcatenatedHistory
>
The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'
>
Using left outer joins (all from base, the rest optional) I've got a
view something like:
>
View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory
>
Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
>
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;
>
Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
>
Where Region = 58;
>
But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
>
Where Region = 58
and ConcatenatedHistory like '%test%' ;
>
What puzzles me most is that it's taking longer to return nothing.
>
I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.
>
The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.
>
Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code>From View inner join History on History.ID = View.ID
>
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';
>
I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.
>
~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.
>
If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

do you query the concatenated data, "normalized", in a way that you
could use FULL TEXT ?|||I expect that the normalized approach can perform much better than parsing
ConcatenatedHistory using LIKE as long as you have the proper indexes in
place and tune your queries. However it's difficult to make recommendations
without the actual DDL of your existing objects.

Quote:

Originally Posted by

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code

Quote:

Originally Posted by

>>From View inner join History on History.ID = View.ID

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

You might be able to reformulate this query as something like the example
below:

SELECT View.ID, View.Name, View.Region, View.etc, View.Data, View.Data2,
History.Date, History.Code
FROM View
INNER JOIN History ON
History.ID = View.ID
WHERE
View.Region = 58 AND
History.Date = '11/28/2006'
UNION
SELECT View.ID, View.Name, View.Region, View.etc, View.Data, View.Data2,
History.Date, History.Code
FROM View
INNER JOIN History ON
History.ID = View.ID
WHERE
View.Region = 58 AND
History.Code = '2007X';

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Merennulli" <maross@.sdf.lonestar.orgwrote in message
news:1170979522.693854.323490@.s48g2000cws.googlegr oups.com...

Quote:

Originally Posted by

Quote:

Originally Posted by

>>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;
>
Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

Quote:

Originally Posted by

>>From View

Where Region = 58;
>
But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

Quote:

Originally Posted by

>>From View

Where Region = 58
and ConcatenatedHistory like '%test%' ;
>
What puzzles me most is that it's taking longer to return nothing.
>
I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.
>
The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.
>
Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code

Quote:

Originally Posted by

>>From View inner join History on History.ID = View.ID

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';
>
I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.
>
~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.
>
If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.
>

|||On Feb 8, 7:27 pm, "Steve" <morrisz...@.hotmail.comwrote:

Quote:

Originally Posted by

On Feb 8, 4:05 pm, "Merennulli" <mar...@.sdf.lonestar.orgwrote:
>
>
>
>
>

Quote:

Originally Posted by

To start with, I'll give a simplified overview of my data.

Quote:

Originally Posted by

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

Quote:

Originally Posted by

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

Quote:

Originally Posted by

ProblemTable (4mil rows)
ID | ConcatenatedHistory

Quote:

Originally Posted by

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Quote:

Originally Posted by

Using left outer joins (all from base, the rest optional) I've got a
view something like:

Quote:

Originally Posted by

Quote:

Originally Posted by

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Quote:

Originally Posted by

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Quote:

Originally Posted by

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Quote:

Originally Posted by

Where Region = 58;

Quote:

Originally Posted by

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Quote:

Originally Posted by

Where Region = 58
and ConcatenatedHistory like '%test%' ;

Quote:

Originally Posted by

What puzzles me most is that it's taking longer to return nothing.

Quote:

Originally Posted by

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

Quote:

Originally Posted by

Quote:

Originally Posted by

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code>From View inner join History on History.ID = View.ID

Quote:

Originally Posted by

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

Quote:

Originally Posted by

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

Quote:

Originally Posted by

Quote:

Originally Posted by

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

>
do you query the concatenated data, "normalized", in a way that you
could use FULL TEXT ?- Hide quoted text -
>
- Show quoted text -

Steve,

The "normalized" data broke the code and date into a datetime field
and a 5 character char (eg. '2006X'). With that change, I was doing
exact matching instead of "like" comparisons. As far as my knowledge
extends, that should have more than compensated for the table size
difference (4mil vs 25mil and later vs 9mil). My first assumption was
that the indexes were wrong, but a non-clustered index on the ID and
Code fields should have been correct for this. I can try a clustered
on the pair, but I don't see where that would improve performance.|||Merennulli (maross@.sdf.lonestar.org) writes:

Quote:

Originally Posted by

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

Quote:

Originally Posted by

>>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

I completely agree with Dan that normalising ConcatenatedHistory into
its own table, would give you better performance. But not know the
tables or indexes its difficult to say why your attempt failed.

The one thing I can suggest to improve the speed of the current
query is that you add a COLLATE clause to force a binary collation:

ConcatenatedHistory LIKE '%2007%' COLLATE Latin1_General_BIN

this is particular important if you use a Windows collation or your
column is varchar.

Quote:

Originally Posted by

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I don't really know what this code is. Isn't that just a date or
rather a period? And is that really the WHERE clause? Or should it
be:

Where View.Region = 58
(and History.Date = '11/28/2006' or History.Code = '2007X')

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On Feb 10, 10:55 am, Erland Sommarskog <esq...@.sommarskog.sewrote:

Quote:

Originally Posted by

Merennulli (mar...@.sdf.lonestar.org) writes:

Quote:

Originally Posted by

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

Quote:

Originally Posted by

>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

>
I completely agree with Dan that normalising ConcatenatedHistory into
its own table, would give you better performance. But not know the
tables or indexes its difficult to say why your attempt failed.

Sorry, it looks like my last message in response to Dan didn't go
through.
The problem was indeed my index. It failed because I had the ID first
and the date second.
Flipping the field order in the index brought my time down to about
5-8 seconds.

My thought had been that the query would use start with the other side
of the join - the view, narrow it down first and tie to the index
values, then find the date out of the remaining small section of the
index. Instead it seems it started with the opposite side of the join
from what I expected. Because of that it was within an average of 10
places of being enforced as completely random from the date field's
perspective.

Thanks for pointing me in the right direction.

Wednesday, March 28, 2012

Optimizing a big query

No comments:

Post a Comment

OpenXML Identity Insert

Blog Archive

About Me