By Richard Anderson
Introduction
The Extensible Markup Language (XML) is one of those technologies that you just know you should start supporting in your applications sooner rather than later. Since it's official ratification in February 1998, it is fast becoming the de-facto way of exchanging and representing structured documents for many reasons ?suitability, easy of use, availability of tools, media coverage etc.
In this article I'll give you a flying start on how to add XML support to your application using the IE5 XML object model, in conjunction with the Visual C++ 6.0 compiler ?and (of course) COM.
In this first article I'll cover how to create a couple of XML documents from scratch. In subsequent articles I'll show you how to load and manipulate existing XML documents, as well as transforming XML documents into HTML (XHTML to be precise) by using the Extensible Stylesheet Language (XSL).
To get the most from this article you should have a basic understanding of XML concepts, specifically the structure of XML documents, and you also need to understand the basic concepts of the W3C Document Object Model (DOM), which is used to represent both XML and HTML documents in memory. I would love to introduce you to both of these standards, but this is only a short article. Such detail is best left to another article, or books such as XML Design and Implementation (ISBN 1861002289 http://www.wrox.com/Books/Book_Details.asp?ISBN=1861002289) published by Wrox Press.
The first sample we are going to write will produce an XML file that looks like this when viewed in IE5:
OK, I admit it, its nothing special or exciting, but the sample is very helpful in demonstrating some of the basic concepts about using the IE5 object model from C++. It shows how to create an XML document object, and how DOM nodes are created and added to it. These nodes represent the COMDeveloper element (represented by the start and end tags) and the 'Hello XML' text shown in the picture.
Using the Import Directive
All the samples in this article use the #import directive to de-compile the IE5 type library into useable C++ classes and smart pointers. This directive is part of what's known as the native COM compiler support. I've chosen to write the samples using this approach for several reasons. It is simple to use, helps to clearly demonstrate the usage of the IE5 objects without complicating the code, and ?most importantly ?it doesn't require you to go through the somewhat painful process of trying to get the latest SDK header files before you start.
Lets start by taking a look at the import directive that creates the C++ wrapper classes for the IE5 object model:
#import "msxml.dll" rename_namespace("xml")
The most obvious point here is that the type library for IE5 is contained within MSXML.DLL. As this is located in the windows system directory, I've not had to specify a full path. The rename_namespace attribute defines all of the created wrapper classes in a C++ namespace called xml. This is good because it prevents possible conflicts with existing include files you might have on your system, but does mean that everything must be prefixed with a namespace qualifier. So, for the interface IXMLDOMDocument (which will be discussed later) we have to explicitly specify the namespace like this: xml::IXMLDOMDocument. I personally quite like this syntax, but if you dislike namespaces or simply don't want to use them, try using the no_namespace attribute.
You can leave out the rename_namespace attribute, but you will have to prefix everything with MSXML.
The #import command gives us access to around 30 COM interfaces and 5 classes. A lot of these are really legacy interfaces (from IE4), which have been kept to maintain compatibility, although they provide additional features not supported by some of the new interfaces. The IE5 interfaces that you should be using all start with IXMLDOM. To help give you a feel for these interfaces, here is a UML diagram showing the IXMLDOMxxx interfaces:
The list of interfaces is quite daunting at first, but it is actually a direct representation of the W3C DOM specification. Microsoft has done a good job of implementing this specification in IE5. But, as usual, you will find that they have added additional methods and properties not in the W3C spec. OK, time to try out some of these interfaces.
Like any good COM abiding application, each of our samples registers and de-registers from the COM runtime using CoInitialize and CoUninitialize. Whilst this isn't anything new or exciting, you must remember to release all COM interface pointers before calling the later function. This may sound obvious, but you should remember that smart pointers only release their contained interface pointer when they're destroyed. So, if we look at the code from our sample that creates the XML document object, you will notice the smart pointer for the XML document is scoped to ensure that it is destroyed before the call to CoUnitialize:
CoInitialize(NULL); // Register with COM
{
HRESULT hRes;
xml::IXMLDOMDocumentPtr spMyFirstDocument;
hRes = spMyFirstDocument.CreateInstance(__uuidof(xml::DOMDocument));
if(FAILED(hRes))
{
printf("Failed to create DOM document : %08x\n", hRes );
return 1;
}
}
CoUninitialize(); // Unregister with COM
Without this scoping the smart pointer would try and release its contained interface pointer after COM is uninitialised, causing our poor old sample app to crash.
As the code shows, creating a new XML document is pretty straightforward. The #import directive generates smart pointer types (such as IXMLDOMDocumentPtr) for each interface in the IE5 type library, which we then use as shown above.
Looking at the code in a bit more detail, we see that an XML document object is created using the CreateInstance() method of the spMyFirstDocument smart pointer (sp). We specify the COM class to be created by using the rather nice __uuidof keyword, which simply extracts the CLSID of the component to create. Once the creation is completed, we have a reference-counted IXMLDOMDocument interface.
For those people that were previously using the IE4 MSXML objects, note that the COM class to create is no longer XMLDocument. That is still available, but I do not recommend using it because it is based upon an early working draft of the DOM specification.
The IXMLDOMDocument interface plays a pivotal role in the IE5 object model. It is the interface by which you can load and save existing documents, and more importantly for this first sample, it is the way in which DOM nodes for an XML document can be created. Remember the W3C DOM specification, every part of an XML document, including the document itself, is represented by a node.
Let's get on and actually create our XML document. First up, we create an element node by calling the createElement() method of IXMLDOMDocument. This takes the name of the element to create and returns a IXMLDOMText interface, which represents a DOM text node. Once again, a smart pointer looks after the returned interface. Next, we take the returned element node and append it to the XML document using the appendChild() method:
xml::IXMLDOMElementPtr spRootElement;
spRootElement = spMyFirstDocument->createElement("COMDeveloper");
spMyFirstDocument->appendChild(RootElement);
In an XML document there can only ever be one root element. The first time you call appendChild() against a document node and pass in an element node, you are setting the root node. If you were to try and add another element node, you'd find that an error is raised.
Next, we use the createTextNode() method to create a new text node containing "Hello XML". We add this as a child node of the element by using the appendChild() method of the IXMLDOMText interface:
spSomeText = spMyFirstDocument->createTextNode("Hello XML");
spRootElement->appendChild(SomeText);
appendChild() is actually defined in IDOMNode, so it's available to any interface that derives from this..
Finally, we save our XML document to disk using the save method:
MyFirstDocument->save("c:\\comdeveloper.xml");
That's it, we've created our first XML document using IE5! Like I said, it's simple, provided you've got a good grounding in XML and DOM.
The __uuidof keyword provides a method of accessing the UUID/GUID for a component or interface. For the DOMDocument component, the uuid is 3EFAA428-272F-11D2-836F-0000F87A7782
Sample 2 - A slightly more complex XML file
In our next sample we will create a slightly more complex XML document, and look at common ways of helping to simplify the usage of the IE5 object model. Starting with the output from the sample again, the generated XML file from this sample will look like this when viewed in IE5:
The document contains a partial list of articles that can be found on COMdeveloper. Each article is detailed within a child element node called Article. This node contains three additional child text nodes (Title, Author and URL) which collectively describe the article and it's web location. You might also notice that this file has the standard xml processing instruction (<?xml version="1.0" ?>), which identifies the XML standard to which the document conforms. This instruction is also capable of saying whether the document is standalone and what it's encoding is.
For those people like me who are interested in UML modeling of XML schemas, we could represent this file format as follows:
We could also define the format using a Document Type Definition (DTD) for the file like this:
>!DOCTYPE COMDeveloper_Articles [
>!ELEMENT COMDeveloper_Articles (Article+)>
>!ELEMENT Article (Title,Author,URL) >
>!ELEMENT Title (#PCDATA)>
>!ELEMENT Author (#PCDATA)>
>!ELEMENT URL (#PCDATA)>
]>
The zip file containing the sample code for this article has a sample that loads and validates an XML document against a DTD. It is located in the directory loadxml.
In this second sample we write a couple of semi re-usable helper functions, used to create the XML document. The first of these functions is called AddText(). This creates a text node containing the passed text, and appends it as a child node of the specified parent node:
void AddText(xml::IXMLDOMNode* pParent, const char* pszText)
{
xml::IXMLDOMTextPtr SomeText;
SomeText = pParent->ownerDocument->createTextNode(pszText);
pParent->appendChild(SomeText);
}
The next helper method is called AddArticleInfo(). This provides a way of creating all the DOM nodes (7 in total) that represent information about an article. It creates a new article element, and then creates and associates the sub-elements that contain the author, title and URL information:
void AddArticleInfo(xml::IXMLDOMNode* pParent,
const char* pszTitle,
const char* pszAuthor,
const char* pszURL)
{
xml::IXMLDOMElementPtr spArticle;
xml::IXMLDOMElementPtr spElement;
spArticle = pParent->ownerDocument->createElement("Article");
pParent->appendChild(spArticle);
spElement = pParent->ownerDocument->createElement("Title");
AddText(spElement, pszTitle);
spArticle->appendChild(spElement);
spElement = pParent->ownerDocument->createElement("Author");
AddText(spElement, pszAuthor);
spArticle->appendChild(spElement);
spElement = pParent->ownerDocument->createElement("URL");
AddText(spElement, pszURL);
spArticle->appendChild(spElement);
return;
}
Using these two helper functions we can now create a more complex XML file in just a few lines of code:
int main(int argc, char* argv[])
{
CoInitialize(NULL); // Register with COM
{
HRESULT hRes;
xml::IXMLDOMDocumentPtr spArticlesDoc;
xml::IXMLDOMElementPtr spRootElement;
xml::IXMLDOMProcessingInstructionPtr spXMLPI;
spArticlesDoc.CreateInstance(__uuidof(xml::DOMDocument));
spXMLPI =
spArticlesDoc->createProcessingInstruction("xml","version=\"1.0\"");
spArticlesDoc->appendChild(spXMLPI);
spRootElement = spArticlesDoc->createElement("COMDeveloper_Articles");
spArticlesDoc->appendChild(spRootElement);
AddArticleInfo(spRootElement,
"So, why ATL? why COM?",
"Dr. Richard Grimes",
"http://www.comdeveloper.com/articles/WhyATL.asp");
AddArticleInfo(spRootElement,
"Attribute Programming",
"Dr. Richard Grimes & Sing Li",
"http://www.comdeveloper.com/articles/attributeprogramming.asp");
spArticlesDoc->save( "c:\\comdeveloper.xml" );
}
CoUninitialize(); // Unregister with COM
return 0;
}
The only new aspect here, which I've highlighted, is that I've created a processing instruction node. As mentioned earlier, this specifies the XML version to which the document conforms. As this occurs at the same logical level as the root element, it is also added to the document node.
Well, that's about it for this article. Hopefully I've given you enough information and sample code to help jump-start your XML developments. One point that you should consider from day one is that your applications when shipped will require your clients to have IE5 installed. This is due to various DLL dependencies that Microsoft has promised to one day remove. If you don't want to have to ship IE5 or can't afford to wait for this separation, don't panic. Other vendors do have XML support that you can use, some of which have almost identical interfaces to MSXML. I've listed a couple of these in the reference section.
Some Useful XML Resources
Here are some useful XML links:
The XML 1.0 specification:
http://www.w3.org/TR/1998/REC-xml-19980210
DOM Level 1 specification:
http://www.w3.org/TR/REC-DOM-Level-1/
General purpose XML site with links to many useful resources:
Microsoft:
http://msdn.microsoft.com/xml/
Vivid Creations - Provide an MSXML style control that has the same basic XML DOM interfaces but does not need IE5 to be installed: