Friday, March 30, 2012

Optimizing Query - Product Tips

Hello everyone!
I've got a problem with a real slow query, I would be very happy if somebody has any idea to improve the speed of it...
The idea is to get the top 2 products, a customer hasn't bought wich are in his interest...

query (simplificated)
---------------
SELECT TOP 2 prodID, Title, Price FROM bestSold7Days WHERE
prodID NOT IN (SELECT prodID FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID WHERE (orders.custID=394))
AND
(prodType = COALESCE((SELECT TOP 1 products.prodID FROM orders INNER JOIN orderProducts ON order.orderID = orderProducts.orderID INNER JOIN products ON orderProducts.prodID = products.prodID WHERE (orders.custID=394) GROUP BY products.prodType ORDER BY SUM(orderProducts.PCS) DESC), 2))
---------------
end query

(COALESCE is for replacing if the customer hasnt ordered anything, or hasnt ordered anything of this type)...

Thanks for any time spent!DDL and sample data would help alot...

But I'll give it a go...

DDL:

CREATE TABLE...

Sample Data

INSERT INTO myTable(Cols...
SELECT 'data','more data',1,...UNION ALL
SELECT 'data','more data',1,...UNION ALL
SELECT 'data','more data',1,...UNION ALL
SELECT 'data','more data',1,...UNION ALL

Should help us get an answer quicker...|||Or maybe not...I think I hurt myself...

Let me point out a coupld of things...

SELECT TOP 2
prodID
, Title
, Price
FROM bestSold7Days
WHERE prodID NOT IN (SELECT prodID
FROM orders
INNER JOIN orderProducts
ON orders.orderID = orderProducts.orderID
WHERE orders.custID=394)
AND (prodType = COALESCE((SELECT TOP 1
products.prodID
FROM orders
INNER JOIN orderProducts ON order.orderID = orderProducts.orderID
INNER JOIN products ON orderProducts.prodID = products.prodID
WHERE orders.custID=394
GROUP BY products.prodType
ORDER BY SUM(orderProducts.PCS) DESC), 2))

Does this even run?

Does ProdType = ProdId?
a GROUP BY qith no SCALAR in the SELECT?
OREDER BY SUM...what for?
Why the COALESCE? IF it's NULL (what ever it is) is won't be evaluated

I guess what I'm saying is..post the ddl sample data AND expected results, and tell us what the business req is...

it'll be a lot faster that way...|||I don't have access to the sql server right now, so I cant give you the result and the table structure 100% - maybe i mistyped something on the query itself, but i dont think so..
I've missed the ORDER by cntSold DESC at the end of the query
it worked as far as i've tried :-)
sorry - let me get some things:

table bestSold7Days (generated hourly, articles best sold in the last 7 days)
prodID - product ID
cntSold - sold pieces in 7 days
prodType - product type - e.g. 0 hardware, 1 software, 2 special product
Title - product Title

table orders
orderID - order ID identity
custID - customer ID

table orderProdcuts (products contained in order)
orderID - order ID
prodID - product ID
pcs - Pieces ordered

the whole query puts out following:
prodID, Title, Price
349, H53-39, 393.33
39392, P3838-3, 5959.21

the sense of the hole thing is to get the top 2 sold products in the last 7 days, wich are the same of interest (prodType) wich the customer prefers and which he didn't already order...|||In your query statement, there are three JOIN words. That makes execution of the query very slow. My suggestion is that you may use a stored procedure in which you can separate your query into several steps. That will improve the performance.|||Hello gyuan,
I've already tried that.
Did a sp wich got me the favorite prodType, but overall it didn't really improve the performance very much.

Currently I'm using it through the stored procedure (4 different tables, 4 different prodTypes) and then querying the top 2 products which he didn't already order.

The SP gives out: 3,4,3,50 - this i split in vb and use it in the queries following...

The whole thing takes from 8-30 seconds (depending on how much the customer ordered)|||What about this?

select top 2 b.prodid
from
BestSold7Days b
left join
(
select distinct(p.prodID)
from Orders o
inner join OrderProducts p
on p.orderid = o.orderid
where o.custID = 394
) x
on x.prodID = b.prodID
where x.prodID is null
order by b.cntSold desc|||A good stored procedure can definitely improve the execution speed of the query statements, but it depends on the content of the stored procedure. If you can post the details of your tables and requirements, that will give us a good help to solve it.|||table sets:
products
products2
products3
products4
(all same structure)
prodID - identity
prodType - product type - int

orders
orderID - order ID (identity)
custID - customer ID - int

orderProducts
orderID - order ID - int
prodID - int
pcs - int - pieces ordered

Here's the sp i currently use to do the query:
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS OFF
GO

CREATE FUNCTION [dbo].[getTopprodType] (@.custID int )
RETURNS varchar(50) AS
BEGIN
DECLARE @.Ret varchar(50)
SET @.Ret=ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products ON orderProducts.prodID = products.prodID WHERE (orders.custID = @.custID) GROUP BY products.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
SET @.Ret=@.Ret + ',' + ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products2].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products2 ON orderProducts.prodID = products2.prodID WHERE (orders.custID = @.custID) GROUP BY products2.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
SET @.Ret=@.Ret + ',' + ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products3].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products3 ON orderProducts.prodID = products3.prodID WHERE (orders.custID = @.custID) GROUP BY products3.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
SET @.Ret=@.Ret + ',' + ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products4].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products4 ON orderProducts.prodID = products4.prodID WHERE (orders.custID = @.custID) GROUP BY products4.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
RETURN @.Ret
END

GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO

There are 4 different product tables (unique id's) wich get together here.
The SP puts out the following
2,3,4,9
I put this into an array in vb to use it for the following query
SELECT TOP 2 bestSold7Days.Anz, products.prodID, products.Title, products.Price FROM bestSold7Days INNER JOIN products ON bestSold7Days.prodID = products.prodID WHERE ((products.Price IS NOT NULL) AND ((SELECT TOP 1 orderProducts.prodID FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID WHERE (orders.custID = " & Me.custID & ") GROUP BY orderProducts.prodID HAVING (orderProducts.prodID = bestSold7Days.prodID)) IS NULL)" & tempGen & " ORDER BY bestSold7Days.Anz DESC;

tempGen is replaced by the favourite prodType (e.g. AND prodType=4 if has any)|||found a faster solution for the second query!
if i do the lookup of the products wich the customer already ordered in a derived table (like a join, but the table is a query) it's much faster!
gut the whole process down to ~2-6 seconds.
Now i need to optimize the getTopProdType procedure, still taking some seconds...|||Could you post how many records in each of the tables? And what are indexes on those tables?|||Indexes on all searched fields
prodID
prodType
(in all tables)

products1 ~ 20000 records
products2 ~ 200000 records
products3 ~ 5000 records
products4 ~ 2000000 records
orders ~ 50000 records
orderProducts ~ 2000000 records

prodType is not clustered.
does anybody have a rule when to make an index clustered?|||1. Based on your query, an index on order.custID would be helpful.

2. It would speed up the query a lot if you store customers favorite prodType in a table instead of figuring it out every time you run the query. Daily update on this field might be good enough.|||sorry forgot this field - custID is also indexed.
Only field queried wich is not indexed is the pcs field in the orderProducts table.|||I think item 2 would be very helpful to speed up your query.
If a little slower update on orders related tables is acceptable, you may update the information in table xxx using triggers or whenever an order is updated.

SELECT TOP 2
prodID
, Title
, Price
FROM bestSold7Days
WHERE prodID NOT IN (SELECT prodID
FROM orders
INNER JOIN orderProducts
ON orders.orderID = orderProducts.orderID
WHERE orders.custID=394)
AND (prodType = COALESCE((SELECT prodType
FROM xxxx
WHERE custID=394), 2))|||what do you mean by item 2?
i've just seen that you can only do one clustered index... all main id's are clustered indexes...|||Originally posted by shianmiin
1. Based on your query, an index on order.custID would be helpful.

2. It would speed up the query a lot if you store customers favorite prodType in a table instead of figuring it out every time you run the query. Daily update on this field might be good enough.

I mean the item no. 2 above.|||that query would take 69 hours to finish (~50000 customers, ~5 seconds per prodType get).|||i've just got it working with a runtime of 60 seconds for all customers.
I don't know why it's that fast - but it works :)

INSERT INTO tProdTypeFavourites(custID, prodType1...)
(SELECT ... ) (SELECT ...)

the sub-queries are pretty big but the whole thing runs very fast!|||good job. :)|||thanks - same for you! thnx for every reply!

No comments:

Post a Comment