OpenXML Identity Insert: speed

Showing posts with label speed. Show all posts

Friday, March 30, 2012

Optimizing Query - Product Tips

Hello everyone!
I've got a problem with a real slow query, I would be very happy if somebody has any idea to improve the speed of it...
The idea is to get the top 2 products, a customer hasn't bought wich are in his interest...

query (simplificated)
---------------
SELECT TOP 2 prodID, Title, Price FROM bestSold7Days WHERE
prodID NOT IN (SELECT prodID FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID WHERE (orders.custID=394))
AND
(prodType = COALESCE((SELECT TOP 1 products.prodID FROM orders INNER JOIN orderProducts ON order.orderID = orderProducts.orderID INNER JOIN products ON orderProducts.prodID = products.prodID WHERE (orders.custID=394) GROUP BY products.prodType ORDER BY SUM(orderProducts.PCS) DESC), 2))
---------------
end query

(COALESCE is for replacing if the customer hasnt ordered anything, or hasnt ordered anything of this type)...

Thanks for any time spent!DDL and sample data would help alot...

But I'll give it a go...

DDL:

CREATE TABLE...

Sample Data

INSERT INTO myTable(Cols...
SELECT 'data','more data',1,...UNION ALL
SELECT 'data','more data',1,...UNION ALL
SELECT 'data','more data',1,...UNION ALL
SELECT 'data','more data',1,...UNION ALL

Should help us get an answer quicker...|||Or maybe not...I think I hurt myself...

Let me point out a coupld of things...

SELECT TOP 2
prodID
, Title
, Price
FROM bestSold7Days
WHERE prodID NOT IN (SELECT prodID
FROM orders
INNER JOIN orderProducts
ON orders.orderID = orderProducts.orderID
WHERE orders.custID=394)
AND (prodType = COALESCE((SELECT TOP 1
products.prodID
FROM orders
INNER JOIN orderProducts ON order.orderID = orderProducts.orderID
INNER JOIN products ON orderProducts.prodID = products.prodID
WHERE orders.custID=394
GROUP BY products.prodType
ORDER BY SUM(orderProducts.PCS) DESC), 2))

Does this even run?

Does ProdType = ProdId?
a GROUP BY qith no SCALAR in the SELECT?
OREDER BY SUM...what for?
Why the COALESCE? IF it's NULL (what ever it is) is won't be evaluated

I guess what I'm saying is..post the ddl sample data AND expected results, and tell us what the business req is...

it'll be a lot faster that way...|||I don't have access to the sql server right now, so I cant give you the result and the table structure 100% - maybe i mistyped something on the query itself, but i dont think so..
I've missed the ORDER by cntSold DESC at the end of the query
it worked as far as i've tried :-)
sorry - let me get some things:

table bestSold7Days (generated hourly, articles best sold in the last 7 days)
prodID - product ID
cntSold - sold pieces in 7 days
prodType - product type - e.g. 0 hardware, 1 software, 2 special product
Title - product Title

table orders
orderID - order ID identity
custID - customer ID

table orderProdcuts (products contained in order)
orderID - order ID
prodID - product ID
pcs - Pieces ordered

the whole query puts out following:
prodID, Title, Price
349, H53-39, 393.33
39392, P3838-3, 5959.21

the sense of the hole thing is to get the top 2 sold products in the last 7 days, wich are the same of interest (prodType) wich the customer prefers and which he didn't already order...|||In your query statement, there are three JOIN words. That makes execution of the query very slow. My suggestion is that you may use a stored procedure in which you can separate your query into several steps. That will improve the performance.|||Hello gyuan,
I've already tried that.
Did a sp wich got me the favorite prodType, but overall it didn't really improve the performance very much.

Currently I'm using it through the stored procedure (4 different tables, 4 different prodTypes) and then querying the top 2 products which he didn't already order.

The SP gives out: 3,4,3,50 - this i split in vb and use it in the queries following...

The whole thing takes from 8-30 seconds (depending on how much the customer ordered)|||What about this?

select top 2 b.prodid
from
BestSold7Days b
left join
(
select distinct(p.prodID)
from Orders o
inner join OrderProducts p
on p.orderid = o.orderid
where o.custID = 394
) x
on x.prodID = b.prodID
where x.prodID is null
order by b.cntSold desc|||A good stored procedure can definitely improve the execution speed of the query statements, but it depends on the content of the stored procedure. If you can post the details of your tables and requirements, that will give us a good help to solve it.|||table sets:
products
products2
products3
products4
(all same structure)
prodID - identity
prodType - product type - int

orders
orderID - order ID (identity)
custID - customer ID - int

orderProducts
orderID - order ID - int
prodID - int
pcs - int - pieces ordered

Here's the sp i currently use to do the query:
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS OFF
GO

CREATE FUNCTION [dbo].[getTopprodType] (@.custID int )
RETURNS varchar(50) AS
BEGIN
DECLARE @.Ret varchar(50)
SET @.Ret=ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products ON orderProducts.prodID = products.prodID WHERE (orders.custID = @.custID) GROUP BY products.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
SET @.Ret=@.Ret + ',' + ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products2].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products2 ON orderProducts.prodID = products2.prodID WHERE (orders.custID = @.custID) GROUP BY products2.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
SET @.Ret=@.Ret + ',' + ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products3].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products3 ON orderProducts.prodID = products3.prodID WHERE (orders.custID = @.custID) GROUP BY products3.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
SET @.Ret=@.Ret + ',' + ISNULL((SELECT TOP 1 CONVERT(varchar(50),[products4].prodType) FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID INNER JOIN products4 ON orderProducts.prodID = products4.prodID WHERE (orders.custID = @.custID) GROUP BY products4.prodType ORDER BY COUNT(orderProducts.PCS) DESC),'')
RETURN @.Ret
END

GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO

There are 4 different product tables (unique id's) wich get together here.
The SP puts out the following
2,3,4,9
I put this into an array in vb to use it for the following query
SELECT TOP 2 bestSold7Days.Anz, products.prodID, products.Title, products.Price FROM bestSold7Days INNER JOIN products ON bestSold7Days.prodID = products.prodID WHERE ((products.Price IS NOT NULL) AND ((SELECT TOP 1 orderProducts.prodID FROM orders INNER JOIN orderProducts ON orders.orderID = orderProducts.orderID WHERE (orders.custID = " & Me.custID & ") GROUP BY orderProducts.prodID HAVING (orderProducts.prodID = bestSold7Days.prodID)) IS NULL)" & tempGen & " ORDER BY bestSold7Days.Anz DESC;

tempGen is replaced by the favourite prodType (e.g. AND prodType=4 if has any)|||found a faster solution for the second query!
if i do the lookup of the products wich the customer already ordered in a derived table (like a join, but the table is a query) it's much faster!
gut the whole process down to ~2-6 seconds.
Now i need to optimize the getTopProdType procedure, still taking some seconds...|||Could you post how many records in each of the tables? And what are indexes on those tables?|||Indexes on all searched fields
prodID
prodType
(in all tables)

products1 ~ 20000 records
products2 ~ 200000 records
products3 ~ 5000 records
products4 ~ 2000000 records
orders ~ 50000 records
orderProducts ~ 2000000 records

prodType is not clustered.
does anybody have a rule when to make an index clustered?|||1. Based on your query, an index on order.custID would be helpful.

2. It would speed up the query a lot if you store customers favorite prodType in a table instead of figuring it out every time you run the query. Daily update on this field might be good enough.|||sorry forgot this field - custID is also indexed.
Only field queried wich is not indexed is the pcs field in the orderProducts table.|||I think item 2 would be very helpful to speed up your query.
If a little slower update on orders related tables is acceptable, you may update the information in table xxx using triggers or whenever an order is updated.

SELECT TOP 2
prodID
, Title
, Price
FROM bestSold7Days
WHERE prodID NOT IN (SELECT prodID
FROM orders
INNER JOIN orderProducts
ON orders.orderID = orderProducts.orderID
WHERE orders.custID=394)
AND (prodType = COALESCE((SELECT prodType
FROM xxxx
WHERE custID=394), 2))|||what do you mean by item 2?
i've just seen that you can only do one clustered index... all main id's are clustered indexes...|||Originally posted by shianmiin
1. Based on your query, an index on order.custID would be helpful.

2. It would speed up the query a lot if you store customers favorite prodType in a table instead of figuring it out every time you run the query. Daily update on this field might be good enough.

I mean the item no. 2 above.|||that query would take 69 hours to finish (~50000 customers, ~5 seconds per prodType get).|||i've just got it working with a runtime of 60 seconds for all customers.
I don't know why it's that fast - but it works :)

INSERT INTO tProdTypeFavourites(custID, prodType1...)
(SELECT ... ) (SELECT ...)

the sub-queries are pretty big but the whole thing runs very fast!|||good job. :)|||thanks - same for you! thnx for every reply!

Friday, March 23, 2012

Optimize speed of Hosted SQL Server

Hi,

I'm retrieving data in VBA using simple SELECT statements on my Hosted SQL Server. How do I most effectively speed up the process?

Is it just about data amount, traffic, and speed of Internet conncetion?

Or should I use Stored Procedures or should I index tables?

Is there a function to kind of "flush" the data to the VBA program as it is all READ-ONLY?

Any help is much appreciated,

Jakob

Can you give more information about how your database schema looks like, what type of queries you are doing, and how your network architecture looks like?

Based on that it might be easier to answer your questions.

Thanks,

Marcel van der Holst
[MSFT]|||

The queries are very simple, just simple "SELECT xx, xx, xx FROM xx WHERE xx=xx"

I'm connecting from home with my lapto to a hosted SQL Server, so no big network. I have tried testing my code to see how and when it is slow. It it quite fast to open the connection, but when I try to through the records using for example the rs.movenext function I can see that it takes a second or two each time for each record and each field in each record. It seems like I'm maintaining an open connection over the internet when I actually just want to the data read-only.

Does that help?

Jakob

|||Would it be possible to run the queries against a local SQL Server that runs on your laptop. That way, you can ignore the network latency, and see if the query is causing the problem.

If you use RecordSet, it will keep the connection open until you explicitely close it. Even when you close it, connection pooling will be used to keep the connection open a while, just in case you need it again.

How much data do you select, i.e. how big is the data in the select xx,xx statement? if that data is big, and you have to go over a big network, it might cause the slowdown..

Another thing to consider is to retrieve all the data you need in a big select statement (select * from Table), and then store this data in a local cache, and do the searching and filtering locally, instead of doing it over the internet.

Thanks,

Marcel van der Holst
[MSFT]|||

the database is about 30mb. The queries are quite simple - just "SELECT xx, xx FROM xx WHERE xx=yy".

I just converted my Access 2003 database, which was stored locally, to hosted SQL Server. An update which previously took about 10 seconds now takes about 2 minutes.

I tried to limit the number of rows which helped quite a lot. Then I also tried to change the format of a field from "nText" to "nVar" that also helped a lot. It seems like the amount of data transmitted is the single most important factor - even though I'm running on a 8mbit line.

Most of my data I just need in arrays - I usually don't update any tables. My thought was that it should be possible to send the query result from the SQL Server in some kind of "flat" text array format and not in a recordset format.

I'm using ADODB in VBA with code lines like:

Set rsData = New ADODB.Recordset

rsData.Open sSql, conGPAM, adOpenKeyset, adLockOptimistic

aData = rsData.GetRows(.1)

I have tried to change LockType and CursorType, but that doesn't really make a difference.

Any suggestions?
|||

ADO is horrible for direct access. Use sprocs. Use output parameters if guaranteed to return just one row or if you only need a single output (like a count).

Check the network latency to your host.

Appropriate indexing is paramount for optimal perfomance. This is somewhat science and somewhat art/experience.

Optimize speed of Hosted SQL Server

Hi,

I'm retrieving data in VBA using simple SELECT statements on my Hosted SQL Server. How do I most effectively speed up the process?

Is it just about data amount, traffic, and speed of Internet conncetion?

Or should I use Stored Procedures or should I index tables?

Is there a function to kind of "flush" the data to the VBA program as it is all READ-ONLY?

Any help is much appreciated,

Jakob

Can you give more information about how your database schema looks like, what type of queries you are doing, and how your network architecture looks like?

Based on that it might be easier to answer your questions.

Thanks,

Marcel van der Holst
[MSFT]|||

The queries are very simple, just simple "SELECT xx, xx, xx FROM xx WHERE xx=xx"

Does that help?

Jakob

|||Would it be possible to run the queries against a local SQL Server that runs on your laptop. That way, you can ignore the network latency, and see if the query is causing the problem.

How much data do you select, i.e. how big is the data in the select xx,xx statement? if that data is big, and you have to go over a big network, it might cause the slowdown..

Thanks,

Marcel van der Holst
[MSFT]|||

the database is about 30mb. The queries are quite simple - just "SELECT xx, xx FROM xx WHERE xx=yy".

I just converted my Access 2003 database, which was stored locally, to hosted SQL Server. An update which previously took about 10 seconds now takes about 2 minutes.

I'm using ADODB in VBA with code lines like:

Set rsData = New ADODB.Recordset

rsData.Open sSql, conGPAM, adOpenKeyset, adLockOptimistic

aData = rsData.GetRows(.1)

I have tried to change LockType and CursorType, but that doesn't really make a difference.

Any suggestions?
|||

ADO is horrible for direct access. Use sprocs. Use output parameters if guaranteed to return just one row or if you only need a single output (like a count).

Check the network latency to your host.

Appropriate indexing is paramount for optimal perfomance. This is somewhat science and somewhat art/experience.

Monday, March 19, 2012

Optimiser and UDF's

I came across something today that hopefully if my explanation is
wrong someone could correct me.
I was trying to speed up a stored proc that took an integer as a
parameter and then ran around a dozen SQL statements to populate a
table variable.
1st query was :-
select <some columns> from A where A.id = @.id
The others were along those lines but joined to several lookup tables
and other combinations. Basically though it was hitting table A with
the same search conditions each time (id = @.id).
Thinking this was a waste I wrote a UDF that took the id as a
parameter and returned a table with just the relevant records from
table A in it. The SQL in the stored proc was then changed as follows
:-
select <some columns> from myUDF(@.id)
The new stored proc now runs twice as fast, uses half the cpu time and
a third of the disk io.
Would I be right in saying that SQL2k has realised that the UDF had
been called a dozen times with the same parameters, ran it once and
cached the results?
Its the only reason I can see for such a big performance improvement.
cheers.No, UDF results are not cached in SQL 2k, but that might be a nice feature
for Yukon... Sybase, for instance, does cache udf results...If your efforts
lowered the number of joins that were being done, I suspect the improvement
lies there.
"Mike Watson" <mike@.prog99.com> wrote in message
news:q1h7lvoluabdh7ulaigatf1ndn05gaofo8@.4ax.com...
> I came across something today that hopefully if my explanation is
> wrong someone could correct me.
> I was trying to speed up a stored proc that took an integer as a
> parameter and then ran around a dozen SQL statements to populate a
> table variable.
> 1st query was :-
> select <some columns> from A where A.id = @.id
> The others were along those lines but joined to several lookup tables
> and other combinations. Basically though it was hitting table A with
> the same search conditions each time (id = @.id).
> Thinking this was a waste I wrote a UDF that took the id as a
> parameter and returned a table with just the relevant records from
> table A in it. The SQL in the stored proc was then changed as follows
> :-
> select <some columns> from myUDF(@.id)
> The new stored proc now runs twice as fast, uses half the cpu time and
> a third of the disk io.
> Would I be right in saying that SQL2k has realised that the UDF had
> been called a dozen times with the same parameters, ran it once and
> cached the results?
> Its the only reason I can see for such a big performance improvement.
> cheers.
>|||On Tue, 2 Sep 2003 07:45:12 -0400, "Wayne Snyder"
<wsnyder@.computeredservices.com> wrote:
>No, UDF results are not cached in SQL 2k, but that might be a nice feature
>for Yukon... Sybase, for instance, does cache udf results...If your efforts
>lowered the number of joins that were being done, I suspect the improvement
>lies there.
>
Thanks for the reply, I'm surprised that the overhead of calling the
UDF around a dozen times is so much cheaper than doing a striaght join
on the table.

Monday, February 20, 2012

OPENXML performance

Does anyone know that how to speed up the performance of using OPENXML? I
looked at the execution plan of my query and the remote scan takes up over
90% of the total costs. My XML passed in to the stored procedure only
contains 2 or 3 elements.
thanks,
Joseph
I forgot to mention that the remote scan happened when I ran the following
query:
INSERT INTO #TempTable
Select * From OPENXML(@.hDoc, '/Level1/Level2')
With
#TempTable
"Joseph" <joseph@.bluefield.com.hk> wrote in message
news:%236NpkWohEHA.3476@.tk2msftngp13.phx.gbl...
> Does anyone know that how to speed up the performance of using OPENXML? I
> looked at the execution plan of my query and the remote scan takes up over
> 90% of the total costs. My XML passed in to the stored procedure only
> contains 2 or 3 elements.
> thanks,
> Joseph
>
|||Hi Joseph,
From your descriptions, I am afraid I could not show you where or what to
be modified as it is very hard to troubleshooting issue in newsgroup.
However, I would like to show you some general idea on XML performance
tuning.
Based on my socpe, you should be aware of some of the following scalability
issues that are involved in using SQLXML:
Avoid OPENXML over Large XML Documents
Be aware that there are limitations to the amount of memory that is
available to the OPENXML construct over an XML document operation. This
operation builds a Document Object Model (DOM) in the SQL buffer space that
can be much larger than the original document size. Also, this operation is
limited to one eighth of the buffer space, and large XML documents may
consume this memory fairly quickly and cause an out-of-memory error on the
server. Do not create large systems based on this functionality without
conducting significant load testing. You might also want to use the XML
bulk load option if possible.
Avoid Large Numbers of Concurrent OPENXML Statements over XML Documents
You also have to consider the issue with OPENXML when you use OPENXML to
batch inserts. This is a fairly common operation because it is an effective
way to issue a group of inserts with one statement. Issuing a group of
inserts reduces the overhead of multiple insert statements and multiple
round trips. However, be aware that this approach may not be very scalable
because of the aforementioned memory limitations.
More detailed information could be found in the following documents
Optimizing SQLXML Performance
http://msdn.microsoft.com/library/de...us/dnsql2k/htm
l/sqlxml_optimperformance.asp
Chapter 14 - Improving SQL Server Performance
http://msdn.microsoft.com/library/de...us/dnpag/html/
scalenetchapt14.asp
Thank you for your patience and cooperation. If you have any questions or
concerns, don't hesitate to let me know. We are here to be of assistance!
Sincerely yours,
Mingqing Cheng
Microsoft Developer Community Support
Introduction to Yukon! - http://www.microsoft.com/sql/yukon
This posting is provided "as is" with no warranties and confers no rights.
Please reply to newsgroups only, many thanks!
|||To slightly clarify Mingqing' statements, see below.
Best regards
Michael
""Mingqing Cheng [MSFT]"" <v-mingqc@.online.microsoft.com> wrote in message
news:3Bl7fvphEHA.2632@.cpmsftngxa10.phx.gbl...
> Hi Joseph,
> From your descriptions, I am afraid I could not show you where or what to
> be modified as it is very hard to troubleshooting issue in newsgroup.
> However, I would like to show you some general idea on XML performance
> tuning.
> Based on my socpe, you should be aware of some of the following
> scalability
> issues that are involved in using SQLXML:
> Avoid OPENXML over Large XML Documents
> Be aware that there are limitations to the amount of memory that is
> available to the OPENXML construct over an XML document operation. This
> operation builds a Document Object Model (DOM) in the SQL buffer space
> that
> can be much larger than the original document size. Also, this operation
> is
> limited to one eighth of the buffer space,
This is not quite correct. It is one eighth of the available MAIN memory
when sp_xml_preparedocument first loads the MSXML 2.6 dll.

> and large XML documents may
> consume this memory fairly quickly and cause an out-of-memory error on the
> server.
While this may happen, it still is fairly rare, unless you run on a low
memory machine. A DOM for a 100kB XML document normally uses 300 to 600kB
based on the structure.

> Do not create large systems based on this functionality without
> conducting significant load testing. You might also want to use the XML
> bulk load option if possible.
That and the following I fully agree with.

> Avoid Large Numbers of Concurrent OPENXML Statements over XML Documents
> You also have to consider the issue with OPENXML when you use OPENXML to
> batch inserts. This is a fairly common operation because it is an
> effective
> way to issue a group of inserts with one statement. Issuing a group of
> inserts reduces the overhead of multiple insert statements and multiple
> round trips. However, be aware that this approach may not be very scalable
> because of the aforementioned memory limitations.
> More detailed information could be found in the following documents
> Optimizing SQLXML Performance
> http://msdn.microsoft.com/library/de...us/dnsql2k/htm
> l/sqlxml_optimperformance.asp
> Chapter 14 - Improving SQL Server Performance
> http://msdn.microsoft.com/library/de...us/dnpag/html/
> scalenetchapt14.asp
> Thank you for your patience and cooperation. If you have any questions or
> concerns, don't hesitate to let me know. We are here to be of assistance!
>
> Sincerely yours,
> Mingqing Cheng
> Microsoft Developer Community Support
> Introduction to Yukon! - http://www.microsoft.com/sql/yukon
> This posting is provided "as is" with no warranties and confers no rights.
> Please reply to newsgroups only, many thanks!
>
|||In addition to the general comment.
OPENXML is not really executing "remotely", but is modelled after the OLEDB
rowset providers. It calls out to the XPath engine of MSXML that runs
in-proc with the server.
The scalability depends on the size of the data, availability of memory and
the type of XPath expressions that you are using. Normally, it should scale
linearly, but there are some XPath expression that in SQL Server 2000 are
worse (they should be better in the 64-bit version and SQL Server 2005 and
hopefully (no guarantees yet) in SP4.
Best regards
Michael
"Joseph" <joseph@.bluefield.com.hk> wrote in message
news:%236NpkWohEHA.3476@.tk2msftngp13.phx.gbl...
> Does anyone know that how to speed up the performance of using OPENXML? I
> looked at the execution plan of my query and the remote scan takes up over
> 90% of the total costs. My XML passed in to the stored procedure only
> contains 2 or 3 elements.
> thanks,
> Joseph
>

Friday, March 30, 2012

Optimizing Query - Product Tips

Friday, March 23, 2012

Optimize speed of Hosted SQL Server

Optimize speed of Hosted SQL Server

Monday, March 19, 2012

Optimiser and UDF's

Monday, February 20, 2012

OPENXML performance

OpenXML Identity Insert

Blog Archive

About Me