BA372 - XML & Web Services
- Notice how the markup tags convey meaning, yet say nothing
about how this information must be displayed.
- Also notice how the concepts are hierarchical; i.e., 'nested:'
- a collection
can contain more than one painting.
- titles
and artists
appear only within paintings.
- This strict hierarchy is easy for programs to 'parse.'
- An XML document or fragment can be represented as a tree
(similar to a directory/folder tree).
- Problem: Can
you draw the <collection>
tree?
- Trees are 'nice' data structures to represent and parse.
- Document Object Model
(DOM) is a W3C-governed standard for representing XML documents as
'trees.'
- An entire XML document can be read into a tree and stored
in memory; e.g., as a DOM object.
- When the tree is properly constructed; i.e., as a tree, the document
is
said to be well formed.
- Only well-formed documents can be reliably parsed ==>
XML makes things very rigid & predictable ==> easy on
programs to parse and 'take apart.'
- Problem:
what are some examples of a not well-formed XML tree?
- Problem:
make an XML document, introduce some well formedness errors and try
pick up the 'broken' document with your browser; what do you observe?
- Since XML (like HTML) is text-based, it is compatible across
computing platforms ==> important implications for
'interoperability.'
Problem:
Who determines which 'tags' or
'elements' are available? What if I would like to create some XML to
represent course student enrollment? Which XML elements are there for
me to use?
- Note how an element or 'node' such as <section> can
have attributes, similar to attributes in HTML tags.
- Problem: When
to make something an attribute rather than an element?
But how does the
receiving computer know which XML elements I made up?
How, therefore, can it 'parse' my XML?
- Answer 1: If the document is well formed; a document tree can
always be formed.
- However, what about more complex constraints?
- <student>
must have both a <first_name>
and a <last_name>.
- <first_name>s
and <last_name>s
are atomic; i.e., they must
contain nothing but characters.
- A <section>
has an attribute number.
- Answer 2: you have to provide a declaration of your syntax (grammar +
vocabulary): Document Type
Definition (DTD) or XML
Schema (XSD).
XML (Harold & Means (2001),
p. 30)
|
DTD
|
<person>
<name>
<first>Allen</first>
<last>Turing</last>
</name>
<profession>computer scientist</profession>
<profession>mathematician</profession>
<profession>cryptographer</profession>
</person>
|
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT profession (#PCDATA)>
<!ELEMENT name (first, last)>
<!ELEMENT person (name, profession*)>
|
- Problem: So how do
the XML document and the DTD/XSD get associated with each other? How
does
the receiving computer know which DTD/XSD goes with which XML document?
- Internal DTD:
DTD and XML stored in the same document:
<!DOCTYPE
person [
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT profession (#PCDATA)>
<!ELEMENT name (first, last)>
<!ELEMENT person (name, profession*)>
]>
<person>
<name>
<first>Allen</first>
<last>Turing</last>
</name>
<profession >computer scientist</profession>
<profession>mathematician</profession>
<profession>cryptographer</profession>
</person>
- External DTD:
XML document contains a reference to a DTD elsewhere on the Web.
<!DOCTYPE
person SYSTEM "http://www.mywebsite.com/person.dtd">
<person>
<name>
<first>Allen</first>
<last>Turing</last>
</name>
<profession >computer scientist</profession>
<profession>mathematician</profession>
<profession>cryptographer</profession>
</person>
- Examples of XML-based formats:
- So XML makes it 'easy' to represent contents... So what do we do
with that?
- Store internal information; e.g.,
- TeachEngineering documents.
- Microsoft Office 2007 default document storage is XML.
- System and application configuration files (recall
environment
variables).
- Anything else we want to store in a well-formed format.
- Make systems interoperate with each other.
- Interoperability with XML & Web services.
- Traditional: Application X running on platform P cannot
communicate with application Y running on platform Q.
- So how to make these apps talk to each other?
- Proprietary data exchange standard CORBA, COM, ActiveX, etc.)
- Web services:
HTTP/XML-based data provider. Two types:

- Notice how in both the DLESE and yahoo request the search
strings are passed through the URL ==> not so nice:
- Message reply is pure XML, but request is a mixture of
HTTP & XML.
- Whereas the full functionality of XML is available for
sending the message reply, it is not for the message request.
- Service-Oriented
(previously: Simple Object) Access Protocol (SOAP):
- XML specification for messaging (Web services) on the Web;
governed by W3C.
- Provides a structure for Web-based messaging; both request
& response:
- Can easily be wrapped inside HTTP.
- Often used in conjunction with Web Services Description Language
(WSDL).
- UDDI? (Universal Description and Discovery Integration):
- Web services registry/directory service (sponsored
by OASIS).
- Applications can
- Search the UDDI for applicable services,
- then use their WSDLs to find out how to use them,
- then use SOAP to use them.
- Tragedy of metadata (Havenstein
article).
- Overall: Web services are being considered as a general
mechanism for interoperability:
- Problem: We're
moving to so-called Service-Oriented
Architectures (SOA):
- Problem: Isn't
this
what OOP was supposed to give us?
- Problem: So,
what's
the difference? Why would XML-based Web services be able to accomplish
what OOP did not?