Monday, February 20, 2012

openxml seemingly fails to process external entity

I am having a problem getting OPENXML to process some xml that contains an
"external entity" in SQL Server 2000.
Background, which you probably already know:
Xml uses "external entities" the way the C language uses "#include": when
an xml processor finds the directive, it refers to an external file whose
contents are to be processed at that point.
I have created a minimal example that illustrates the problem.
1. First, here is some SQL that correctly processes some XML that does
*not* use external entities. The XML is stored in a variable (@.testXml), and
a select statement reads this XML and reports some basic facts about the only
data in it, namely, that there is a tag called "DatastreamDownload ":
-- demonstrate validity of simple xml
declare @.testXml varchar(8000)
set @.testXml ='
<?xml version="1.0" ?>
<DatastreamDownload />
'
declare @.idoc int
exec sp_xml_preparedocument @.idoc output, @.testXml
select *
from openxml( @.idoc, '/DatastreamDownload', 0)
exec sp_xml_removedocument @.idoc
Here are the beginnings of the lines I see in SQL Query Analyzer:
id parentid nodetype localname
-- -- -- --
0 NULL 1 DatastreamDownload
(1 row(s) affected)
2. I then stored the XML from @.testXml in a file, simple.xml. The
following 2 lines show the contents of the file, except that I've added 4
spaces to the beginning of each of the lines in this listing (that is, the
prefixed spaces are not in the file):
<?xml version="1.0" ?>
<DatastreamDownload />
3. I created a second file of XML, simpleEntity.xml, that includes
simple.xml by using a external entity. The following 11 lines show the
contents of the file, except that I've added 4 spaces to the beginning of
each of the lines in this listing (that is, the prefixed spaces are not in
the file):
<?xml version="1.0" ?>
<!DOCTYPE just4ExternalEntity
[
<!ENTITY externalEntity SYSTEM "simple.xml">
]
>
<just4ExternalEntity>
<workingTag />
&externalEntity;
</just4ExternalEntity>
You can verify that this external entity stuff is done properly by opening
this second file in Internet Explorer. I see something like this, except
with better colors:
<?xml version="1.0" ?>
<!DOCTYPE just4ExternalEntity (View Source for full doctype...)>
- <just4ExternalEntity>
<workingTag />
<DatastreamDownload />
</just4ExternalEntity>
4. Finally, I copied the working xml from simpleEntity.xml into a SQL
variable, @.xmlDoc. I had to prefix a network path to "simple.xml", to give
the SQL Server a chance of locating simple.xml. I've represented this
network path by "\\unc..." below, although in my actual test I used a valid
(and triple-tested) unc.
Openxml can partially process this code. The first select statement below
reports on the "workingTag" tag, which is defined in the @.xmlDoc variable ,
but fails to report on the sibling "DatastreamDownload" tag, which is defined
in the external entity.
-- attempt to read the same xml from a disk file using an external entity
declare @.xmlDoc varchar(1000)
set @.xmlDoc = -- code from simpleEntity.xml
'
<?xml version="1.0" ?>
<!DOCTYPE just4ExternalEntity
[
<!ENTITY externalEntity SYSTEM "\\unc...\simple.xml">
]
>
<just4ExternalEntity>
<workingTag />
&externalEntity;
</just4ExternalEntity>
'
declare @.jdoc int
declare @.rc int
exec @.rc = sp_xml_preparedocument @.jdoc output, @.xmlDoc
select @.rc as resultCode
-- lists row, as expected
select *
from openxml( @.jdoc, '/just4ExternalEntity/workingTag', 0)
-- fails to list row
select *
from openxml( @.jdoc, '/just4ExternalEntity/DatastreamDownload', 0)
exec sp_xml_removedocument @.jdoc
Here are the beginnings of the lines I see in SQL Query Analyzer:
resultCode
0
(1 row(s) affected)
id parentid nodetype localname
-- -- -- --
7 0 1 workingTag
(1 row(s) affected)
id parentid nodetype localname
-- -- -- --
(0 row(s) affected)
I was expecting to see 1 row in the last result set above, with
"DatastreamDownload" in the "localname" column.
So openxml is failing to process something that Internet Explorer has no
trouble with. As far as I know (and I'm no expert) external entities are a
standard part of XML, and a standards-conforming XML processor should handle
them. So it seems very likely that the processor used by openxml does handle
them. It's even possible that openxml and IE are using the same processor,
although that would quite a feat of coordination for a company the size of
Microsoft.
Since external entities are standard, and since IE issues a security warning
when I use one via a network unc, I'm suspecting that there is a privilege
problem in getting openxml to read the external entity.
Thanks for any help.
External entities are not supported with OpenXML for a variety of security
related reasons.
Best is to resolve them on the client/midtier side before sending the XML to
the server.
Best regards
Michael
"HolmesDM" <HolmesDM@.discussions.microsoft.com> wrote in message
news:8AF86E16-03D4-49A7-9A20-8133C2B0E109@.microsoft.com...
>I am having a problem getting OPENXML to process some xml that contains an
> "external entity" in SQL Server 2000.
> Background, which you probably already know:
> --
> Xml uses "external entities" the way the C language uses "#include": when
> an xml processor finds the directive, it refers to an external file whose
> contents are to be processed at that point.
>
> I have created a minimal example that illustrates the problem.
> 1. First, here is some SQL that correctly processes some XML that does
> *not* use external entities. The XML is stored in a variable (@.testXml),
> and
> a select statement reads this XML and reports some basic facts about the
> only
> data in it, namely, that there is a tag called "DatastreamDownload ":
> -- demonstrate validity of simple xml
> declare @.testXml varchar(8000)
> set @.testXml ='
> <?xml version="1.0" ?>
> <DatastreamDownload />
> '
> declare @.idoc int
> exec sp_xml_preparedocument @.idoc output, @.testXml
> select *
> from openxml( @.idoc, '/DatastreamDownload', 0)
> exec sp_xml_removedocument @.idoc
>
> Here are the beginnings of the lines I see in SQL Query Analyzer:
> id parentid nodetype localname
> -- -- -- --
> 0 NULL 1
> DatastreamDownload
> (1 row(s) affected)
>
> 2. I then stored the XML from @.testXml in a file, simple.xml. The
> following 2 lines show the contents of the file, except that I've added 4
> spaces to the beginning of each of the lines in this listing (that is, the
> prefixed spaces are not in the file):
> <?xml version="1.0" ?>
> <DatastreamDownload />
>
> 3. I created a second file of XML, simpleEntity.xml, that includes
> simple.xml by using a external entity. The following 11 lines show the
> contents of the file, except that I've added 4 spaces to the beginning of
> each of the lines in this listing (that is, the prefixed spaces are not in
> the file):
> <?xml version="1.0" ?>
> <!DOCTYPE just4ExternalEntity
> [
> <!ENTITY externalEntity SYSTEM "simple.xml">
> ]
> <just4ExternalEntity>
> <workingTag />
> &externalEntity;
> </just4ExternalEntity>
> You can verify that this external entity stuff is done properly by opening
> this second file in Internet Explorer. I see something like this, except
> with better colors:
> <?xml version="1.0" ?>
> <!DOCTYPE just4ExternalEntity (View Source for full doctype...)>
> - <just4ExternalEntity>
> <workingTag />
> <DatastreamDownload />
> </just4ExternalEntity>
> 4. Finally, I copied the working xml from simpleEntity.xml into a SQL
> variable, @.xmlDoc. I had to prefix a network path to "simple.xml", to
> give
> the SQL Server a chance of locating simple.xml. I've represented this
> network path by "\\unc..." below, although in my actual test I used a
> valid
> (and triple-tested) unc.
> Openxml can partially process this code. The first select statement below
> reports on the "workingTag" tag, which is defined in the @.xmlDoc variable
> ,
> but fails to report on the sibling "DatastreamDownload" tag, which is
> defined
> in the external entity.
> -- attempt to read the same xml from a disk file using an external
> entity
> declare @.xmlDoc varchar(1000)
> set @.xmlDoc = -- code from simpleEntity.xml
> '
> <?xml version="1.0" ?>
> <!DOCTYPE just4ExternalEntity
> [
> <!ENTITY externalEntity SYSTEM "\\unc...\simple.xml">
> ]
> <just4ExternalEntity>
> <workingTag />
> &externalEntity;
> </just4ExternalEntity>
> '
> declare @.jdoc int
> declare @.rc int
> exec @.rc = sp_xml_preparedocument @.jdoc output, @.xmlDoc
> select @.rc as resultCode
> -- lists row, as expected
> select *
> from openxml( @.jdoc, '/just4ExternalEntity/workingTag', 0)
> -- fails to list row
> select *
> from openxml( @.jdoc, '/just4ExternalEntity/DatastreamDownload',
> 0)
> exec sp_xml_removedocument @.jdoc
>
> Here are the beginnings of the lines I see in SQL Query Analyzer:
> resultCode
> --
> 0
> (1 row(s) affected)
> id parentid nodetype localname
> -- -- -- --
> 7 0 1 workingTag
> (1 row(s) affected)
> id parentid nodetype localname
> -- -- -- --
> (0 row(s) affected)
> I was expecting to see 1 row in the last result set above, with
> "DatastreamDownload" in the "localname" column.
> So openxml is failing to process something that Internet Explorer has no
> trouble with. As far as I know (and I'm no expert) external entities are
> a
> standard part of XML, and a standards-conforming XML processor should
> handle
> them. So it seems very likely that the processor used by openxml does
> handle
> them. It's even possible that openxml and IE are using the same
> processor,
> although that would quite a feat of coordination for a company the size of
> Microsoft.
> Since external entities are standard, and since IE issues a security
> warning
> when I use one via a network unc, I'm suspecting that there is a privilege
> problem in getting openxml to read the external entity.
> Thanks for any help.
|||Thanks for the definitive answer, even if it's not exactly the one I'd hoped
for.
"Michael Rys [MSFT]" wrote:

> External entities are not supported with OpenXML for a variety of security
> related reasons.
> Best is to resolve them on the client/midtier side before sending the XML to
> the server.
> Best regards
> Michael
> "HolmesDM" <HolmesDM@.discussions.microsoft.com> wrote in message
> news:8AF86E16-03D4-49A7-9A20-8133C2B0E109@.microsoft.com...
>
>

No comments:

Post a Comment