XML: Why and how?

2001-12-29 13:24:52【作者】 畅享网 【进入论坛】
本文关键字 理论探讨 协同商务
广告

XML: Why and how?

HTML: Simple but limited

HTML transformed the Internet from an obscure academic tool into the rich, flexible, and powerful World Wide Web that we know today. HTML has some huge advantages that made it ripe for this explosive growth. It is relatively simple to learn and use. With just a limited number of tags and a word processor, anyone can produce legible HTML pages within minutes. But the syntax is still powerful, despite being simple. Lists, links, graphics, and formatted text can all be produced with a set of easy-to-learn tags. And HTML, with a few minor exceptions, is a universally accepted language, common to all web pages and displayable by all browsers on all platforms. If you build a web page with well-formatted HTML, then you are guaranteed the widest possible audience across all platforms.


HTML is too static and rigid for the fast-changing world of the Web.

HTML's very success led to the creation of a demand for ever larger and more sophisticated web applications. But HTML cannot cope with the increasing demands made on it. HTML is too static and rigid for the fast-changing world of the Web. Its tags are hard-coded by committees and browser authors. The addition of new tags should be possible, without making them arbitrary or nonstandard.

HTML was designed for a fixed purpose - web pages. But the information these pages contain is valuable in many other formats, such as printed documents, manuals, financial data in databases and so on. Transforming HTML to or from these alternative storage mediums is costly and difficult.

Looking ahead, it would be a huge benefit to capture and organize the vast amount of information on the Web. But HTML documents are intrinsically unstructured: HTML tags describe how documents should be displayed, not how documents are structured. This means that simple text searches are all that is possible. Searching structured documents would magnify the usefulness of information on the Web many fold.

Finally, HTML's linking mechanisms are very weak. Links break regularly, with no warning, and they are hard-coded into the document, making it necessary to use a tool to maintain them. And links are unidirectional - an unnecessary limitation on a pure hyperlinking system.

What is XML?

Aware of the problems facing HTML, the World Wide Web Consortium (W3C) ratified a standard for a new form of mark-up language, called XML. XML stands for Extensible Markup Language. It bears some superficial similarities to HTML, but is in fact a much more powerful concept.


XML defines the structure of documents, whereas HTML defines how to display them.

XML defines the structure of documents, whereas HTML defines how to display them. XML tags mark out sections of a document according to the content, rather than according to how it should be displayed. And XML tags are understood by the XML parser (i.e., the browser) dynamically, using a Document Type Definition (DTD) document. A DTD defines the valid tags, and their meaning, for all XML documents of a particular type. Authors can define their own DTDs or - more likely - use publicly available DTDs that map to their particular application. The XML parser knows how to display the document by consulting a style sheet, written in XSL (Extensible Style Language). Different style sheets result in different displays of the basic, underlying document.

XML in practice

Let's say that a travel agent wants to allow users to browse through lists of currently available holiday destinations. A typical section of HTML might look like this: -

<H1>Turkey</H1>
<H2>Istanbul</H2>
<H3>The city of the sultans</H3>
Istanbul, gateway from west to east, has been the capital of three empires throughout its long, troubled history.<P>
<H2>Ankara</H2>
<H3>Heart of the new republic</H3>
The founders of the modern, secular Turkey chose Ankara as their new capital.

In XML, it might look like this: - <LOCATIONS> <COUNTRY><NAME>Turkey</NAME>
<CITY><NAME>Istanbul</NAME<
<CITYSUMMARY>The city of the sultans</CITYSUMMARY>
<CITYDESCRIPTION>Istanbul, gateway from west to east, has been the capital of three empires throughout its long, troubled history.<P>
</CITYDESCRIPTION>
</CITY> Ankara
<CITYSUMMARY>Heart of the new republic</CITYSUMMARY>
<CITYDESCRIPTION>The founders of the modern, secular Turkey chose Ankara as their new capital.
</CITYDESCRIPTION>
</CITY>
</COUNTRY>
</LOCATIONS>

The XML document describes structure. Locations are divided into countries and cities. And a city is made up of a one-line description and a longer description. Notice that the XML document mentions nothing about display - that is left to the style sheet. The style sheet author may decide, for the purposes of the on-line version, to print it thus: -

TURKEY Istanbul - The city of the sultans

Istanbul, gateway from west to east, has been the capital of three empires throughout its long, troubled history.

Ankara - Heart of the new republic

The founders of the modern, secular Turkey chose Ankara as their new capital.

An additional style sheet can be produced that allows the underlying XML document to be printed in a brochure. Or the user may decide to just browse the country and cities, with their short descriptions, ignoring the longer descriptions. And the document could also include hotel information, prices, availability, and so on. Each of these new fields would require a tag (not shown above) defined in the DTD and a matching style sheet specification to state how it should be displayed.

All of this power and flexibility is available because XML has split structure from display.

XML links

XML will greatly extend the hyperlinking mechanism familiar to all web surfers. Links, defined with the XLL (Extensible Link Language), can be stored and maintained independently of the documents in which they appear. They can be bidirectional, allowing greater power to link common data within structured documents. And they can have attributes that define what type of link it is. Some links will even make the link target document look like part of the link source document. All of this extra flexibility and power will transform the way documents are accessed and linked together across the Web.

The Document Object Model

The Document Object Model (DOM) is a standard object-based API that will allow, among other things, scripting languages dynamic access to XML document content. The content of tags will be available for complex processing on the client side.

XML applications

XML will be used wherever data needs to be structured for presentation or interchange, so it will greatly enhance e-commerce. XML is an excellent way to define nonproprietary data structures, thus allowing for smooth interchange of data between heterogeneous systems across the Web. For example, if all travel agents defined travel packages in a standard way, using the same DTD, they would be easily able to transfer data between themselves and airlines, hotels, and customers.


XML will greatly improve the searching capabilities of current web search tools.

XML will greatly improve the searching capabilities of current web search tools. Because documents will be defined and structured logically, with data separated from meta-data, searches can zoom in on the relevant tags. To take a simple example, a customer could search the Web for all travel agents offering two-week trips to Istanbul for less than $1500. The search tool could search for the identifying tags , , and in sites offering the standard travel agent DTD-compliant documents to find meaningful information.

Summary

XML will enhance the power of the Web considerably. It will leverage the vast store of information on the Web to improve searching and data interchange. And it will help overcome some of the limitations currently faced by HTML.

如果您希望与本文章的作者或其所在机构,进一步交流,请联系:畅享网 姜小姐
jill.jiang@amt.com.cn | 021-51096826-112 | 在线联系
云顶山涧——吴勇毅金融危机,国内IT业如何过冬?

面对种种困难、挑战与危机,如今是IT企业未雨绸缪深谋远虑的时候,必须及早对未来持久战战略作出正确抉择的时候了。

程荣彬聊信息化用IT实践知识管理

对于国内的医药企业(本文中主要指医药生产企业)来说,机会和危机并存是真实的现状,知识管理能够为他们带来哪些帮助?

蓝凌知味堂知识地图在项目型组织中的应用

将项目实施标准化,项目内容知识化,从而降低企业人力资源成本、提高工作效率、提升管理水平,增强企业的核心竞争力。

机遇与挑战并存 协同软件大比拼

2007年,中国协同软件市场份额达到了16.21 亿元人民币,较2006年13亿增长了24.7%。2008年的协同软件,呈现出鲜明的“进、转、合”并举的态势。协……