Friday, March 30, 2012

Optimizing LIKE OR Selects

I'm having problems optimizing a sql select statement that uses a LIKE statement coupled with an OR clause. For simplicity sake, I'll demonstrate this with a scaled down example:

table Company, fields CompanyID, CompanyName

table Address, fields AddressID, AddressName

table CompanyAddressAssoc, fields AssocID, CompanyID, AddressID

CompanyAddressAssoc is the many-to-many associative table for Company and Address. A search query is required that, given a search string ( i.e. 'TEST' ), return all Company -> Address records where either the CompanyName or AddressName starts with the parameter:

Select c.CompanyID, c.CompanyName, a.AddressName

FROM Company c

LEFT OUTER JOIN CompanyAddressAssoc caa ON caa.CompanyID = c.CompanyID

LEFT OUTER JOIN Address a ON a.AddressID = caa.AddressID

WHERE ((c.CompanyName LIKE 'TEST%') OR (a.AddressName LIKE 'TEST%))

There are proper indexes on all tables. The execution plan creates a hash table on one LIKE query, then meshes in the other LIKE query. This takes a very long time to do, given a dataset of 500,000+ records in Company and Address.

Is there any way to optimize this query, or is it a problem with the base table implementation?

Any advice would be appreciated.

? Hi Brian, Do you really need to use OUTER JOINs here? For an INNER JOIN, the optimizer can sometimes pick a more efficient execution plan. It's a long shot, but sometimes performance for a query with OR can be improved by re-writing it as a UNION of two queries: SELECT c.CompanyID, c.CompanyName, a.AddressName FROM Company AS c -- Change to INNER JOIN if possible LEFT JOIN CompanyAddressAssoc AS caa ON caa.CompanyID = c.CompanyID LEFT JOIN Address AS a ON a.AddressID = caa.AddressID WHERE c.CompanyName LIKE 'TEST%' UNION -- Or UNION ALL, see below SELECT c.CompanyID, c.CompanyName, a.AddressName FROM Company AS c -- Definitely no need for LEFT OUTER JOIN in this half of the query INNER JOIN CompanyAddressAssoc AS caa ON caa.CompanyID = c.CompanyID INNER JOIN Address AS a ON a.AddressID = caa.AddressID WHERE a.AddressName LIKE 'TEST%' If your data is such that you can be sure there will never be an overlap in the results of the two UNION'ed queries, then change UNION to UNION ALL to gain some more performance. If that's not possible, then you could also try how this one runs: SELECT c.CompanyID, c.CompanyName, a.AddressName FROM Company AS c -- Change to INNER JOIN if possible LEFT JOIN CompanyAddressAssoc AS caa ON caa.CompanyID = c.CompanyID LEFT JOIN Address AS a ON a.AddressID = caa.AddressID WHERE c.CompanyName LIKE 'TEST%' UNION ALL SELECT c.CompanyID, c.CompanyName, a.AddressName FROM Company AS c -- Definitely no need for LEFT OUTER JOIN in this half of the query INNER JOIN CompanyAddressAssoc AS caa ON caa.CompanyID = c.CompanyID INNER JOIN Address AS a ON a.AddressID = caa.AddressID WHERE a.AddressName LIKE 'TEST%' AND c.CompanyName NOT LIKE 'TEST%' -- Assumes CompanyName is never NULL The queries above are untested. See www.aspfaq.com/5006 if you prefer a tested reply. -- Hugo Kornelis, SQL Server MVP <Brian S. Ward@.discussions.microsoft..com> schreef in bericht news:822eb526-281c-409e-80df-9a03e79d3f05@.discussions.microsoft.com... I'm having problems optimizing a sql select statement that uses a LIKE statement coupled with an OR clause. For simplicity sake, I'll demonstrate this with a scaled down example: table Company, fields CompanyID, CompanyName table Address, fields AddressID, AddressName table CompanyAddressAssoc, fields AssocID, CompanyID, AddressID CompanyAddressAssoc is the many-to-many associative table for Company and Address. A search query is required that, given a search string ( i.e. 'TEST' ), return all Company -> Address records where either the CompanyName or AddressName starts with the parameter: Select c.CompanyID, c.CompanyName, a.AddressName FROM Company c LEFT OUTER JOIN CompanyAddressAssoc caa ON caa.CompanyID = c.CompanyID LEFT OUTER JOIN Address a ON a.AddressID = caa.AddressID WHERE ((c.CompanyName LIKE 'TEST%') OR (a.AddressName LIKE 'TEST%)) There are proper indexes on all tables. The execution plan creates a hash table on one LIKE query, then meshes in the other LIKE query. This takes a very long time to do, given a dataset of 500,000+ records in Company and Address. Is there any way to optimize this query, or is it a problem with the base table implementation? Any advice would be appreciated.|||

Hi Hugo,

Thanks for replying to my question. I tried a couple of the things that you mentioned.

Changing the OUTER JOINS to INNER JOINS had no noticeable effect on performance. Additionally, the execution plan seemed to become more complicated.

I tried using a UNION ALL clause between 2 SQL statements setup specifically to select AddressName and CompanyName, but performance was destroyed trying that. I used the actual version of the SQL rather than the test SQL I submitted, which contains about 7 joins.

The only solution that I can think of at this time is to change the schema of the base tables, moving the AddressName and CompanyName into the associative table, therefore allowing one search field to be indexed. It would be a bit more cryptic, but would solve the problem of the LIKE OR issue ( since there would be only one LIKE statement for both checks ).

Any other ideas would be appreciated.

|||

It sounds like having a denormalized schema like you suggest may speed things up. There is nothing wrong in duplicating the addressname and company name fields in the one table, lots of companies have denormalized databases for performance purposes. I used to work on a database for one of the biggest Oil companies in the world, and that was largely denormalized and had no relationships set up (they were enforced by triggers and in the stored procedures).

An alternative which may work (although its a long shot) is to rewrite the OR as a not and such that

A OR B = NOT(NOT A AND NOT B)

One of my former colleagues used to assure me that was faster, but I have never tested it. It works as all computers are built from NAND gates, and thus any boolean statement can be rewitten as a series of NANDs

|||

Hi,

Have you tried to run it as to queries?

Without the OR statement.

Try that and see if it gets better.

If so, then insert the result into a temptable and make the final select from there.

It's hard to speed upp OR selects.

Regards

|||

I've tried that too, running 2 queries then trying to merge them after, but that becomes pretty convoluted trying to decide which records from the 2 sets makes the Top 100. I've decided to go with the denormalization plan for now, populating a 'Name' field in the associative table and using that to search. One index, no Or statement, runs really fast.

Thanks to everyone for your input.

|||

Can you post the statistics profile output (and the xml showplan, if possible)?

We can tell where things are wrong based on that.

Thanks,

Conor

sql

No comments:

Post a Comment