Wednesday, March 28, 2012

Optimizing a JOIN

I have two tables.

One has approx 90,000 rows with a field .. let's call in BigInt (and it
is defined as a bigint data type).

I have a reference table, with approx 10,000,000 rows. In this
reference table, I have starting_bigint and ending_bigint fields. I
want to pull out all of the reference data from the reference table for
all 90,000 rows in the transaction table where the BigInt from the
transaction table is between the starting_bigint and ending_bigint in
the reference table.

I have the join working now, but it is not as optimized as I would
like. It appears no matter what I do, the query does a full table scan
on the 10,000,000 rows in the reference table.

Sample code

SELECT ref.*, tran.bigint
from transactiontable tran
INNER JOIN referencetable ref on tran.bigint between
ref.starting_bigint and ending_bigint

Yes, all 3 of the fields are indexed. I even have a composite index on
the reference table with the starting_bigint and ending_bigint fields
selected as the composite.

Any help would be appreciated.

Robert H. Kershberg
IT Director
Tax Credit Company
RKershberg@.taxcc.com or RKershberg@.pobox.com or RKershberg@.gmail.comIf the starting and ending bigints ranges are not overlapping, then I
would classify this as the "zipcode problem".

If this is the case, you could try the following approach:

SELECT R1.*,T1.bigint
FROM transactiontable T1
CROSS JOIN referencetable R1
WHERE R1.starting_bigint = (
SELECT MAX(starting_bigint)
FROM referencetable R2
WHERE R2.starting_bigint <= T1.bigint
)
AND R1.ending_bigint >= T1.bigint

Hope this helps,
Gert-Jan

"rkershberg@.gmail.com" wrote:
> I have two tables.
> One has approx 90,000 rows with a field .. let's call in BigInt (and it
> is defined as a bigint data type).
> I have a reference table, with approx 10,000,000 rows. In this
> reference table, I have starting_bigint and ending_bigint fields. I
> want to pull out all of the reference data from the reference table for
> all 90,000 rows in the transaction table where the BigInt from the
> transaction table is between the starting_bigint and ending_bigint in
> the reference table.
> I have the join working now, but it is not as optimized as I would
> like. It appears no matter what I do, the query does a full table scan
> on the 10,000,000 rows in the reference table.
> Sample code
> SELECT ref.*, tran.bigint
> from transactiontable tran
> INNER JOIN referencetable ref on tran.bigint between
> ref.starting_bigint and ending_bigint
> Yes, all 3 of the fields are indexed. I even have a composite index on
> the reference table with the starting_bigint and ending_bigint fields
> selected as the composite.
> Any help would be appreciated.
> Robert H. Kershberg
> IT Director
> Tax Credit Company
> RKershberg@.taxcc.com or RKershberg@.pobox.com or RKershberg@.gmail.com|||try skipping the "between." something like
SELECT ref.*, tran.bigint
from transactiontable tran
INNER JOIN referencetable ref
on tran.bigint >= ref.starting_bigint
and tran.bigint <=ref.ending_bigint

try creating a composite index on ref, containing starting_bigint plus
ending_bigint
if that doesn't work, try zapping your existing indexes on ref.

thinking outside the box, try creating staging tables, or ghost tables.
The ghost tables are for a selected period taht you are currently
working on. You copy pieces of your trans table off to a temp table,
and do your lookups against it. you eat the one copy, you eat the
creation of the indexon the temp table, but you get the benefits of not
having to do complex searches against a 100,000,000 row table. this is
RARELY needed, but I've done it on rare occasion.|||Thank you for your time. I had tried a cross join, but not quite the
same way. It'll be interesting to see if there is a significant
difference. I appreciate your intelligent input.

Rob|||Thanks again .. this logic improved performance 3 to 4 fold (I had
already cut a 2 min 30 sec query to 15 seconds .. your logic
implemented on top of mine cut that down to 3 seconds .. this on 88,000
row transacation table running against a 9.6 million row reference
table .. not zip codes, but the right idea <g>).

Thank YOU very much!!!

Rob|||3 seconds is a long time.

how do we make it faster??

i am not a huge fan of clustered indexes. However for a reference
table where you will ALWAYS be looking up data utilizing a specific
column, and rarely inserting into the middle, I'd sure consider it.

For grins, try creating a clustered, composite index on starting_bigint
+ ending_bigint.

For grins after that, consider a clustered index on trans.bigint.
warning: clustered indexes on trans tables are usually not a great idea
IMO, unless the clustered index is based on a timestamp or an identity
column.

-doug

No comments:

Post a Comment