Monday, February 20, 2012

openxml question

Hello,
I read a document from MSDN saying that:
Avoid OPENXML over large XML documents.
Avoid large numbers of concurrent OPENXML statements over XML
documents.
But my question is: how to define large? Is there some suggestion
saying for example that a 100kb xml file will be too big? Or 100
concurrent openxml will be to match for the system?
The document on msdn can be found at:
http://msdn2.microsoft.com/en-us/library/ms998577.aspx
The document was written in 2004. Are the arguments valid for both sql
2000 and 2995?
Many Thanks
Jerry
Yes the arguments are pretty much the same. Other than adding XML data type
support, I'm not aware of any changes to the inner workings of OPENXML. If
you're using SQL 2005, you might try using the nodes() method of the XML
data type to shred your XML data instead.
<DAXU@.hotmail.com> wrote in message
news:7da00930-9135-4ea6-9438-b2276eccdc1e@.h11g2000prf.googlegroups.com...
> Hello,
> I read a document from MSDN saying that:
> Avoid OPENXML over large XML documents.
> Avoid large numbers of concurrent OPENXML statements over XML
> documents.
>
> But my question is: how to define large? Is there some suggestion
> saying for example that a 100kb xml file will be too big? Or 100
> concurrent openxml will be to match for the system?
> The document on msdn can be found at:
> http://msdn2.microsoft.com/en-us/library/ms998577.aspx
> The document was written in 2004. Are the arguments valid for both sql
> 2000 and 2995?
> Many Thanks
> Jerry
>
|||On Wed, 2 Jan 2008 05:15:40 -0800 (PST), DAXU@.hotmail.com wrote:

>Hello,
>I read a document from MSDN saying that:
>Avoid OPENXML over large XML documents.
>Avoid large numbers of concurrent OPENXML statements over XML
>documents.
>
>But my question is: how to define large? Is there some suggestion
>saying for example that a 100kb xml file will be too big? Or 100
>concurrent openxml will be to match for the system?
>The document on msdn can be found at:
>http://msdn2.microsoft.com/en-us/library/ms998577.aspx
>The document was written in 2004. Are the arguments valid for both sql
>2000 and 2995?
No, I think not.
On SQL2005 (!), I've used openxml to process two megabyte files on a
tiny, overloaded laptop, I run five or ten nodes queries to shred it
in two seconds, and that's pretty darned good performance in my book!
Maybe 20mb would be a problem, or 200mb. Assume it does keep
everything in RAM, but tokenized. Also even small systems today do
have gigabytes of RAM, which may be wasn't so true even four years
ago.
J.
|||"JXStern" <JXSternChangeX2R@.gte.net> wrote in message
news:knbrn3lgsftdtev6ofsnt3sv9fosai8oh5@.4ax.com...
> On Wed, 2 Jan 2008 05:15:40 -0800 (PST), DAXU@.hotmail.com wrote:
>
> No, I think not.
> On SQL2005 (!), I've used openxml to process two megabyte files on a
> tiny, overloaded laptop, I run five or ten nodes queries to shred it
> in two seconds, and that's pretty darned good performance in my book!
> Maybe 20mb would be a problem, or 200mb. Assume it does keep
> everything in RAM, but tokenized. Also even small systems today do
> have gigabytes of RAM, which may be wasn't so true even four years
> ago.
One of the problems with the legacy methods of shredding XML using the
COM-based stored procs and OPENXML is that it automatically assigned 1/8th
of your SQL Server's total memory to the XML cache. Unfortunately this is a
limitation of the MSXML parser used by SQL Server to fulfill OPENXML
requests. According to this article you could potentially run out of memory
if you prepare too many XML documents at once without releasing some.
|||FYI, here's the SQL 2005 article detailing the 1/8th memory issue:
http://msdn2.microsoft.com/en-us/library/ms187367.aspx. This issue affects
SQL Server 2005 as well as SQL Server 2000. This means that if your SQL
Server has 2 GB assigned to it, that 2 MB XML file you're processing just
got 250 MB assigned to it until you destroy the COM object with
sp_xml_removedocument.
That's quite a hefty price to pay to process a 2 MB file.
|||"Marc Gravell" <marc.gravell@.gmail.com> wrote in message
news:a6c9a102-6257-4947-b086-73c555aafe01@.l32g2000hse.googlegroups.com...
> Simply out of curiosity, did this out-perform using the xml datatype
> to do the inital shred?
> Marc
I just ran a couple of tests on my server (SQL 2005, 2 GB RAM, 2.2 GHz). On
a simple 1 MB XML document I came up with the following performance:
- shredding an XML type variable with nodes() method 1,000 times took 77
seconds
- shredding a VARCHAR variable with OPENXML 1,000 times took 76 seconds
- shredding an XML type column with nodes() method and primary XML index
1,000 times took 42 seconds
OPENXML appears to be slightly faster for simple XML documents, although it
reserves 1/8th of the SQL Server memory regardless of the size of the XML
document. I didn't test it with more complex documents and paths, like
documents with multiple namespaces, and several levels of nesting, etc.
When you are shredding XML documents stored in a column, the nodes() method
with a primary XML index is considerably faster than either of the other two
methods.
|||Typo, 100 KB XML document, not 1 MB.
"Mike C#" <xyz@.xyz.com> wrote in message
news:eLL1CpKUIHA.3400@.TK2MSFTNGP03.phx.gbl...
> "Marc Gravell" <marc.gravell@.gmail.com> wrote in message
> news:a6c9a102-6257-4947-b086-73c555aafe01@.l32g2000hse.googlegroups.com...
> I just ran a couple of tests on my server (SQL 2005, 2 GB RAM, 2.2 GHz).
> On a simple 1 MB XML document I came up with the following performance:
> - shredding an XML type variable with nodes() method 1,000 times took 77
> seconds
> - shredding a VARCHAR variable with OPENXML 1,000 times took 76 seconds
> - shredding an XML type column with nodes() method and primary XML index
> 1,000 times took 42 seconds
> OPENXML appears to be slightly faster for simple XML documents, although
> it reserves 1/8th of the SQL Server memory regardless of the size of the
> XML document. I didn't test it with more complex documents and paths,
> like documents with multiple namespaces, and several levels of nesting,
> etc. When you are shredding XML documents stored in a column, the nodes()
> method with a primary XML index is considerably faster than either of the
> other two methods.
>
|||My pardon, I was using openrowset('bulk ...'), and I guess that is using the
XML datatype for shredding.
Josh
"Marc Gravell" wrote:

> Simply out of curiosity, did this out-perform using the xml datatype
> to do the inital shred?
> Marc
>
|||"JRStern" <JRStern@.discussions.microsoft.com> wrote in message
news:4F59EC99-46F2-4F48-99CF-F453DCD947C1@.microsoft.com...
> My pardon, I was using openrowset('bulk ...'), and I guess that is using
> the
> XML datatype for shredding.
?
Are you actually shredding the XML data after you load it using
openrowset(...) or are you actually just storing it an an XML type variable
or column? I ask because casting character/binary data to XML and shredding
it to relational format are two completely different tasks.
|||On Mon, 7 Jan 2008 21:32:31 -0500, "Mike C#" <xyz@.xyz.com> wrote:

>Are you actually shredding the XML data after you load it using
>openrowset(...) or are you actually just storing it an an XML type variable
>or column? I ask because casting character/binary data to XML and shredding
>it to relational format are two completely different tasks.
Cast it to XML variable with schema to validate via XSD.
Then pass it to SP as XML with no schema, shred it there with four to
six queries mostly cross applies.
Only reason I use the unschema'd XML in the SP is so I can change the
XSD without first dropping the SP. I timed it both ways, seems to
make very little difference. The shredding is indecently fast. Takes
two seconds to shred into table vars, about ten seconds to write rows
to database.
openrowset used like this doesn't do that 1/8 of RAM, does it?
Thanks.
J.

No comments:

Post a Comment