Monday, February 20, 2012

openxml seemingly fails to process external entity

I am having a problem getting OPENXML to process some xml that contains an
"external entity" in SQL Server 2000.
Background, which you probably already know:
--
Xml uses "external entities" the way the C language uses "#include": when
an xml processor finds the directive, it refers to an external file whose
contents are to be processed at that point.
I have created a minimal example that illustrates the problem.
1. First, here is some SQL that correctly processes some XML that does
*not* use external entities. The XML is stored in a variable (@.testXml), an
d
a select statement reads this XML and reports some basic facts about the onl
y
data in it, namely, that there is a tag called "DatastreamDownload ":
-- demonstrate validity of simple xml
declare @.testXml varchar(8000)
set @.testXml ='
<?xml version="1.0" ?>
<DatastreamDownload />
'
declare @.idoc int
exec sp_xml_preparedocument @.idoc output, @.testXml
select *
from openxml( @.idoc, '/DatastreamDownload', 0)
exec sp_xml_removedocument @.idoc
Here are the beginnings of the lines I see in SQL Query Analyzer:
id parentid nodetype localname
-- -- -- --
0 NULL 1 DatastreamDownload
(1 row(s) affected)
2. I then stored the XML from @.testXml in a file, simple.xml. The
following 2 lines show the contents of the file, except that I've added 4
spaces to the beginning of each of the lines in this listing (that is, the
prefixed spaces are not in the file):
<?xml version="1.0" ?>
<DatastreamDownload />
3. I created a second file of XML, simpleEntity.xml, that includes
simple.xml by using a external entity. The following 11 lines show the
contents of the file, except that I've added 4 spaces to the beginning of
each of the lines in this listing (that is, the prefixed spaces are not in
the file):
<?xml version="1.0" ?>
<!DOCTYPE just4ExternalEntity
[
<!ENTITY externalEntity SYSTEM "simple.xml">
]
>
<just4ExternalEntity>
<workingTag />
&externalEntity;
</just4ExternalEntity>
You can verify that this external entity stuff is done properly by opening
this second file in Internet Explorer. I see something like this, except
with better colors:
<?xml version="1.0" ?>
<!DOCTYPE just4ExternalEntity (View Source for full doctype...)>
- <just4ExternalEntity>
<workingTag />
<DatastreamDownload />
</just4ExternalEntity>
4. Finally, I copied the working xml from simpleEntity.xml into a SQL
variable, @.xmlDoc. I had to prefix a network path to "simple.xml", to give
the SQL Server a chance of locating simple.xml. I've represented this
network path by "\\unc..." below, although in my actual test I used a valid
(and triple-tested) unc.
Openxml can partially process this code. The first select statement below
reports on the "workingTag" tag, which is defined in the @.xmlDoc variable ,
but fails to report on the sibling "DatastreamDownload" tag, which is define
d
in the external entity.
-- attempt to read the same xml from a disk file using an external entity
declare @.xmlDoc varchar(1000)
set @.xmlDoc = -- code from simpleEntity.xml
'
<?xml version="1.0" ?>
<!DOCTYPE just4ExternalEntity
[
<!ENTITY externalEntity SYSTEM "\\unc...\simple.xml">
]
>
<just4ExternalEntity>
<workingTag />
&externalEntity;
</just4ExternalEntity>
'
declare @.jdoc int
declare @.rc int
exec @.rc = sp_xml_preparedocument @.jdoc output, @.xmlDoc
select @.rc as resultCode
-- lists row, as expected
select *
from openxml( @.jdoc, '/just4ExternalEntity/workingTag', 0)
-- fails to list row
select *
from openxml( @.jdoc, '/just4ExternalEntity/DatastreamDownload', 0)
exec sp_xml_removedocument @.jdoc
Here are the beginnings of the lines I see in SQL Query Analyzer:
resultCode
--
0
(1 row(s) affected)
id parentid nodetype localname
-- -- -- --
7 0 1 workingTag
(1 row(s) affected)
id parentid nodetype localname
-- -- -- --
(0 row(s) affected)
I was expecting to see 1 row in the last result set above, with
"DatastreamDownload" in the "localname" column.
So openxml is failing to process something that Internet Explorer has no
trouble with. As far as I know (and I'm no expert) external entities are a
standard part of XML, and a standards-conforming XML processor should handle
them. So it seems very likely that the processor used by openxml does handl
e
them. It's even possible that openxml and IE are using the same processor,
although that would quite a feat of coordination for a company the size of
Microsoft.
Since external entities are standard, and since IE issues a security warning
when I use one via a network unc, I'm suspecting that there is a privilege
problem in getting openxml to read the external entity.
Thanks for any help.External entities are not supported with OpenXML for a variety of security
related reasons.
Best is to resolve them on the client/midtier side before sending the XML to
the server.
Best regards
Michael
"HolmesDM" <HolmesDM@.discussions.microsoft.com> wrote in message
news:8AF86E16-03D4-49A7-9A20-8133C2B0E109@.microsoft.com...
>I am having a problem getting OPENXML to process some xml that contains an
> "external entity" in SQL Server 2000.
> Background, which you probably already know:
> --
> Xml uses "external entities" the way the C language uses "#include": when
> an xml processor finds the directive, it refers to an external file whose
> contents are to be processed at that point.
>
> I have created a minimal example that illustrates the problem.
> 1. First, here is some SQL that correctly processes some XML that does
> *not* use external entities. The XML is stored in a variable (@.testXml),
> and
> a select statement reads this XML and reports some basic facts about the
> only
> data in it, namely, that there is a tag called "DatastreamDownload ":
> -- demonstrate validity of simple xml
> declare @.testXml varchar(8000)
> set @.testXml ='
> <?xml version="1.0" ?>
> <DatastreamDownload />
> '
> declare @.idoc int
> exec sp_xml_preparedocument @.idoc output, @.testXml
> select *
> from openxml( @.idoc, '/DatastreamDownload', 0)
> exec sp_xml_removedocument @.idoc
>
> Here are the beginnings of the lines I see in SQL Query Analyzer:
> id parentid nodetype localname
> -- -- -- --
--
> 0 NULL 1
> DatastreamDownload
> (1 row(s) affected)
>
> 2. I then stored the XML from @.testXml in a file, simple.xml. The
> following 2 lines show the contents of the file, except that I've added 4
> spaces to the beginning of each of the lines in this listing (that is, the
> prefixed spaces are not in the file):
> <?xml version="1.0" ?>
> <DatastreamDownload />
>
> 3. I created a second file of XML, simpleEntity.xml, that includes
> simple.xml by using a external entity. The following 11 lines show the
> contents of the file, except that I've added 4 spaces to the beginning of
> each of the lines in this listing (that is, the prefixed spaces are not in
> the file):
> <?xml version="1.0" ?>
> <!DOCTYPE just4ExternalEntity
> [
> <!ENTITY externalEntity SYSTEM "simple.xml">
> ]
> <just4ExternalEntity>
> <workingTag />
> &externalEntity;
> </just4ExternalEntity>
> You can verify that this external entity stuff is done properly by opening
> this second file in Internet Explorer. I see something like this, except
> with better colors:
> <?xml version="1.0" ?>
> <!DOCTYPE just4ExternalEntity (View Source for full doctype...)>
> - <just4ExternalEntity>
> <workingTag />
> <DatastreamDownload />
> </just4ExternalEntity>
> 4. Finally, I copied the working xml from simpleEntity.xml into a SQL
> variable, @.xmlDoc. I had to prefix a network path to "simple.xml", to
> give
> the SQL Server a chance of locating simple.xml. I've represented this
> network path by "\\unc..." below, although in my actual test I used a
> valid
> (and triple-tested) unc.
> Openxml can partially process this code. The first select statement below
> reports on the "workingTag" tag, which is defined in the @.xmlDoc variable
> ,
> but fails to report on the sibling "DatastreamDownload" tag, which is
> defined
> in the external entity.
> -- attempt to read the same xml from a disk file using an external
> entity
> declare @.xmlDoc varchar(1000)
> set @.xmlDoc = -- code from simpleEntity.xml
> '
> <?xml version="1.0" ?>
> <!DOCTYPE just4ExternalEntity
> [
> <!ENTITY externalEntity SYSTEM "\\unc...\simple.xml">
> ]
> <just4ExternalEntity>
> <workingTag />
> &externalEntity;
> </just4ExternalEntity>
> '
> declare @.jdoc int
> declare @.rc int
> exec @.rc = sp_xml_preparedocument @.jdoc output, @.xmlDoc
> select @.rc as resultCode
> -- lists row, as expected
> select *
> from openxml( @.jdoc, '/just4ExternalEntity/workingTag', 0)
> -- fails to list row
> select *
> from openxml( @.jdoc, '/just4ExternalEntity/DatastreamDownload',
> 0)
> exec sp_xml_removedocument @.jdoc
>
> Here are the beginnings of the lines I see in SQL Query Analyzer:
> resultCode
> --
> 0
> (1 row(s) affected)
> id parentid nodetype localname
> -- -- -- --
> 7 0 1 workingTag
> (1 row(s) affected)
> id parentid nodetype localname
> -- -- -- --
> (0 row(s) affected)
> I was expecting to see 1 row in the last result set above, with
> "DatastreamDownload" in the "localname" column.
> So openxml is failing to process something that Internet Explorer has no
> trouble with. As far as I know (and I'm no expert) external entities are
> a
> standard part of XML, and a standards-conforming XML processor should
> handle
> them. So it seems very likely that the processor used by openxml does
> handle
> them. It's even possible that openxml and IE are using the same
> processor,
> although that would quite a feat of coordination for a company the size of
> Microsoft.
> Since external entities are standard, and since IE issues a security
> warning
> when I use one via a network unc, I'm suspecting that there is a privilege
> problem in getting openxml to read the external entity.
> Thanks for any help.|||Thanks for the definitive answer, even if it's not exactly the one I'd hoped
for.
"Michael Rys [MSFT]" wrote:

> External entities are not supported with OpenXML for a variety of security
> related reasons.
> Best is to resolve them on the client/midtier side before sending the XML
to
> the server.
> Best regards
> Michael
> "HolmesDM" <HolmesDM@.discussions.microsoft.com> wrote in message
> news:8AF86E16-03D4-49A7-9A20-8133C2B0E109@.microsoft.com...
>
>

No comments:

Post a Comment