Wednesday, March 28, 2012

Optimizing an IN clause

I have a fairly straightforward SELECT query that includes the following:
MembersTable.MemberID IN
(
SELECT ZipcodesTable.MemberID
FROM ZipcodesTable.Zipcode IN
(
'01234','03631','55902' ... '03036'
)
That is, it's looking for entries in the ZipcodeTable where the Zipcode
value is any one of a very large set of zipcodes, up to 500 Zipcodes. My
Zipcode field is a 5 character field.
Would the query run faster if I made the Zipcode field an int instead of 5
chars?
Are there other ways to speed up the zipcode-matching part of my query?
- Roger GarrettChanging to integer is not possible in this case, cause you got leading
zero in your zip code. Converting them to INT would trim them away. The
best way to speed this up would be top store the ZIP codes in a table
to join them. It would be even more easy to manage.
HTH, Jens Suessmeyer.|||Relations. The performance of this query would benefit from using a join
instead of using the IN operator.
How do the values get into the query?
ML
http://milambda.blogspot.com/|||Have you tried placing the zipcodes in a table and using EXISTS instead of
IN?
Andrew J. Kelly SQL MVP
"Roger Garrett" <RogerGarrett@.discussions.microsoft.com> wrote in message
news:5C55A707-6E72-4CDF-BFC0-7320AA8CDA12@.microsoft.com...
>I have a fairly straightforward SELECT query that includes the following:
> MembersTable.MemberID IN
> (
> SELECT ZipcodesTable.MemberID
> FROM ZipcodesTable.Zipcode IN
> (
> '01234','03631','55902' ... '03036'
> )
> That is, it's looking for entries in the ZipcodeTable where the Zipcode
> value is any one of a very large set of zipcodes, up to 500 Zipcodes. My
> Zipcode field is a 5 character field.
> Would the query run faster if I made the Zipcode field an int instead of 5
> chars?
> Are there other ways to speed up the zipcode-matching part of my query?
> - Roger Garrett
>|||>> Would the query run faster if I made the Zipcode field [sic] an INTEGER instea
d of CHAR(5)? <<
ZIP codes are CHAR(5) and not INTEGER. Columns are not fields.
For a large number of zip codes, you might find using a table instead
of a list is faster. It would have an index on its single column.
SELECT member_id
FROM Membership
WHERE zip_code
IN (SELECT zip_code FROM ZipLists);|||Hi Roger,
Make sure you have an index ZipCode, MemberID on the ZipcodesTable.
Also, use EXISTS instead of IN.
AND EXISTS (
SELECT *
FROM ZipcodesTable zt
WHERE zt.Zipcode IN ( ...... )
AND zt.MemberID = MembersTable.MemberID
)
Even better if you could put the IN clause into a table of its own...
AND EXISTS (
SELECT *
FROM #ZCodes zt
WHERE zt.MemberID = MembersTable.MemberID
)
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials
"Roger Garrett" <RogerGarrett@.discussions.microsoft.com> wrote in message
news:5C55A707-6E72-4CDF-BFC0-7320AA8CDA12@.microsoft.com...
>I have a fairly straightforward SELECT query that includes the following:
> MembersTable.MemberID IN
> (
> SELECT ZipcodesTable.MemberID
> FROM ZipcodesTable.Zipcode IN
> (
> '01234','03631','55902' ... '03036'
> )
> That is, it's looking for entries in the ZipcodeTable where the Zipcode
> value is any one of a very large set of zipcodes, up to 500 Zipcodes. My
> Zipcode field is a 5 character field.
> Would the query run faster if I made the Zipcode field an int instead of 5
> chars?
> Are there other ways to speed up the zipcode-matching part of my query?
> - Roger Garrett
>|||Jens,
I don't see that leading zeros would cause any problem. I'm suggesting that
I change the Zipcodes column from char 5 to int and to just store the
numerical integer values of the zipcodes rather than the 5-character string.
A zipcode of "03036" would become a 3036 in the column, and when I'm
searching for an "03036" it would match with the 3036 value. Of course, I
would specify numeric values (e.g. 03036) rathet than quoted strings
('03036').
I'm assuming that a 5-char column occupies at least 6 bytes (to make it on
an even byte boundary) and that an int occupies 4 bytes. That at a minium
saves some space in the database. And then, when I'm looking for a particula
r
value, it only has to compare 4 bytes (which is mostly likely a single
hardware instruction) instead of six bytes, so the queries should run a bit
faster.|||ML,
My application program constructs the query.
What's happenning here is this: I have a Zipcodes table. That table has two
columns, a MemberID column and a Zipcode column. Each member has one entry i
n
this table, signifying the zipcode of where he lives.
At certain times during the running of my application it needs to know the
MemebrIDs of all the members that live within a certain radius of a given
member. My application program figures out which zipcodes are within that
radius and constructs an array of strings signifying that set of zipcodes. I
t
then constructs a query, using that array of strings, in order to get from
the database the set of MemberIDs, from the Zipcodes table, of those members
who reside in any of those zipcodes. The query looks something like:
SELECT ZipcodesTable.MemberID
FROM ZipcodesTable.Zipcode IN
(
'01234','03631','55902' ... lots of zipcodes here ... '03036'
)
As far as I can tell that means that SQL Server has to compare each and
every zipocde in the Zipcodes table with (possibly all of) the zipcodes
within the IN clause of the query. In fact, for MOST of the rows in the
Zipcodes table it will have to do the comparison against ALL of the zipcodes
in the IN clause, since most of the members will not be within ANY of those
zipcodes.
Now, if SQL Server were smart it might order those zipcodes from the IN
clause and determine the smallest and largest zipcode values and thereby do
a
much quicker comparison at each row. But I don't know that I can rely on SQL
Server being that smart.
So I'm looking for a better way to express the query so that it runs as fast
as possible.|||Andrew,
I wasn't familiar with the EXISTS operator (I'm very new to all this) so I
just now read up on it. I don't see how EXISTS will help. How are you
suggesting that the zipcodes be put in a table? Do you mean the set of
zipcodes that I'm looking for for the specific current query? WHat would the
EXISTS query look like?
Please see my reply to Jens for a (hopefully) clearer description of what
I'm trying to accomplish.
"Andrew J. Kelly" wrote:

> Have you tried placing the zipcodes in a table and using EXISTS instead of
> IN?
> --
> Andrew J. Kelly SQL MVP
>
> "Roger Garrett" <RogerGarrett@.discussions.microsoft.com> wrote in message
> news:5C55A707-6E72-4CDF-BFC0-7320AA8CDA12@.microsoft.com...
>
>|||For the performace reason: Try it out, once you converted this into
have your data as a char. (which is like I said and meanwhile also
Steve pointed out, preferable, because you don=B4t have to deal with
later problems around this. We has ourselves in Germany a change from
4digit numbers to 5 digits with a trailing zero. I can tell you, that
was for many software vendors like the Y2k problem).
These is my opinion, my personal experience and advice for you.
HTH, Jens Suessmeyer.

No comments:

Post a Comment