BA372 — XML and Web Services
BA372 — XML and Web Services
- The Web for humans: HyperText Markup Language (HTML):
- But what about programs?
- I am an on-line newspaper and I want to show the NYSE and NASDAQ stock quotes on my Web site.
- I am an on-line weather site and I want to simultaneously display the current
temperatures in 10 large cities.
- I am a mortgage broker and I need to know the current rates of my top 50 lenders.
- I am the OSU library and I would like to show my patrons
which books are available in the UofO's library.
- I am a researcher and I want to know which public companies were bought back shares in the period 2005—2007?
- I am a financial spreadsheet and I would like to know the value of my portfolio.
I am Emergency Dispatch and I need to know who's patrolling South Corvallis.
HTML is great for formatting documents and data for humans.
HTML is not good for programmatic data retrieval:
I'd like to first collect all the information from multiple
sites and then organize it on (an) HTML web pages which I serve up to my (human) customer.
- Most Websites provides their own, specially-formatted HTML pages.
- e.g.: SEC Edgar's HTML filings:
- Every time a site's HTML front-end changes, I (might) have to change my program parsing it.
So I need some HTML, but only at the 'front office'; i.e., when facing my (human) customer/user.
How do I collect all the information which goes on my HTML page in
the first place, without having to change my programs whenever the visited Website changes
its HTML layout?
eXtensible Markup Language (XML) to the rescue:
- Like HTML, XML is a markup language.
- Adopted as W3C standard in 1998.
- However, instead of formatting instructions, it communicates content.
example: from Harold, E.R. & Means, W.S. (2001) XML in a Nutshell; O'Reilly & Associates
<title>Memory of the Garden at Etten</title>
<artist>Vincent van Gogh</artist>
<description>Two women on the left. A third works in her garden</description>
<artist>Pierre Auguste Renoir</artist>
<description>Woman on a swing. Two men and a toddler watch</description>
<title>Apollo and Daphne</title>
<artist>Gian Lorenzo Bernini</artist>
<description> Daphne transforms into a tree as Apollo chases her</description>
Who determines which 'tags' or 'elements' are available? What if we would like to create some XML to
represent course student enrollment? Which XML elements are there for us to use?
- Notice how the markup tags convey information about the art objects, yet say nothing about how this information must be displayed.
- Notice how this is similar (but not identical) to defining tables and columns in a database.
- Also notice how the concepts are hierarchical; i.e., 'nested:'
- An object exists (only) within an
- An art_collection
can contain more than one object.
appear inside objects.
- A location consists of a
city and a
- type is an attribute of
- Problem: When
to make something an attribute rather than a tag?
- This strict hierarchy is easy for programs to 'parse.'
- An XML document or fragment can be represented as a tree (similar to a directory/folder tree).
- Problem: Can
you draw the <art_collection>
- Trees are 'nice' data structures to programmatically represent and parse.
- Document Object Model
(DOM) is a W3C-governed standard for representing XML documents as 'trees.'
- An entire XML document can be read into a tree and stored
in memory; e.g., as a DOM object.
- When the tree is properly constructed; i.e., as a tree; the document is
said to be well formed.
- Only well-formed documents can be reliably parsed —>
XML makes things very rigid and predictable —> easy for programs to parse and 'take apart.'
- Since XML (like HTML) is text-based, it is compatible across
computing platforms —> contributes heavily to 'interoperability.'
- Keep in mind: "4.0"(CPU_1) = "4.0"(CPU_2); 4.0(CPU_1) ≠ 4.0(CPU_2).
- Summary: XML is a hierarchical language for representing content. If different programs
must programmatically(!!!) interoperate, we can design an XML standard and use it to communicate information
between these programs.
But how does the receiving computer know which XML elements we made up? How, therefore, can it 'parse' my XML?
- Answer 1: If the document is well formed, a document tree can always be formed.
- However, what about more complex constraints?
must have both a <first_name>
and a <last_name>.
are atomic; i.e., they must
contain nothing but characters.
- A <section>
has an attribute number which must be a positive integer.
- Answer 2: you have to provide a declaration of your XML syntax (syntax = grammar + vocabulary): Document Type
Definition (DTD) or XML
|XML (Harold & Means (2001),
|DTD or XSD
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT profession (#PCDATA)>
<!ELEMENT name (first, last)>
<!ELEMENT person (name, profession*)>
- Examples of XML-based formats:
- So XML/JSON make it 'easy' to represent content... What can we do with that?
- Store internal information; e.g.,
- TeachEngineering documents.
- Provides a common and well-defined format for all TeachEngineering documents.
- Allows for validity/well-formedness checks on newly submitted and modified documents.
- Allows different types of renderings.
- Documents can be programmatically read by others.
- Microsoft Office 2007+ default document storage is (zipped) XML.
- Hohmann: Use for system and application configuration files.
- Anything else we want to store in a well-formed format; e.g., system logs
- !!! Make systems interoperate with each other !!!
- Interoperability with XML & Web services:
- Web services:
HTTP/XML/JSON-based data providers. Two types:
State Transfer (REST)full: HTTP response contains unadorned, unencapsulated, native XML. Request sending mixes HTTP and XML.
- Notice how in both the DLESE and Google request the search string is passed through the URL (GET request) —> not so nice:
- Server response is pure XML/JSON, but request is HTTP.
- Whereas the full functionality of XML is available for
sending the response, it is not for the message request.
(previously: Simple Object) Access Protocol (SOAP) (!!!Notice the 'Editors' of the W3C document: it's about interoperation!!!):
- XML specification for messaging (Web services) on the Web; governed by W3C.
- Provides a structure for Web-based messaging; both request & response.
- Can easily be wrapped inside HTTP.
- Often used in conjunction with Web Services Description Language
- Overall: Web services (XMl, JSON or any other structured, text-based formats) are being considered as a general mechanism for IS interoperability: