XML: Why and how?

2001-12-29 13:24:52【作者】 畅享网 【进入论坛】
本文关键字 理论探讨 协同商务
广告

XML: Why and how?

HTML: Simple but limited

HTML transformed the Internet from an obscure academic tool into the rich, flexible, and powerful World Wide Web that we know today. HTML has some huge advantages that made it ripe for this explosive growth. It is relatively simple to learn and use. With just a limited number of tags and a word processor, anyone can produce legible HTML pages within minutes. But the syntax is still powerful, despite being simple. Lists, links, graphics, and formatted text can all be produced with a set of easy-to-learn tags. And HTML, with a few minor exceptions, is a universally accepted language, common to all web pages and displayable by all browsers on all platforms. If you build a web page with well-formatted HTML, then you are guaranteed the widest possible audience across all platforms.


HTML is too static and rigid for the fast-changing world of the Web.

HTML's very success led to the creation of a demand for ever larger and more sophisticated web applications. But HTML cannot cope with the increasing demands made on it. HTML is too static and rigid for the fast-changing world of the Web. Its tags are hard-coded by committees and browser authors. The addition of new tags should be possible, without making them arbitrary or nonstandard.

HTML was designed for a fixed purpose - web pages. But the information these pages contain is valuable in many other formats, such as printed documents, manuals, financial data in databases and so on. Transforming HTML to or from these alternative storage mediums is costly and difficult.

Looking ahead, it would be a huge benefit to capture and organize the vast amount of information on the Web. But HTML documents are intrinsically unstructured: HTML tags describe how documents should be displayed, not how documents are structured. This means that simple text searches are all that is possible. Searching structured documents would magnify the usefulness of information on the Web many fold.

Finally, HTML's linking mechanisms are very weak. Links break regularly, with no warning, and they are hard-coded into the document, making it necessary to use a tool to maintain them. And links are unidirectional - an unnecessary limitation on a pure hyperlinking system.

What is XML?

Aware of the problems facing HTML, the World Wide Web Consortium (W3C) ratified a standard for a new form of mark-up language, called XML. XML stands for Extensible Markup Language. It bears some superficial similarities to HTML, but is in fact a much more powerful concept.


XML defines the structure of documents, whereas HTML defines how to display them.

XML defines the structure of documents, whereas HTML defines how to display them. XML tags mark out sections of a document according to the content, rather than according to how it should be displayed. And XML tags are understood by the XML parser (i.e., the browser) dynamically, using a Document Type Definition (DTD) document. A DTD defines the valid tags, and their meaning, for all XML documents of a particular type. Authors can define their own DTDs or - more likely - use publicly available DTDs that map to their particular application. The XML parser knows how to display the document by consulting a style sheet, written in XSL (Extensible Style Language). Different style sheets result in different displays of the basic, underlying document.

XML in practice

Let's say that a travel agent wants to allow users to browse through lists of currently available holiday destinations. A typical section of HTML might look like this: -

<H1>Turkey</H1>
<H2>Istanbul</H2>
<H3>The city of the sultans</H3>
Istanbul, gateway from west to east, has been the capital of three empires throughout its long, troubled history.<P>
<H2>Ankara</H2>
<H3>Heart of the new republic</H3>
The founders of the modern, secular Turkey chose Ankara as their new capital.

In XML, it might look like this: - <LOCATIONS> <COUNTRY><NAME>Turkey</NAME>
<CITY><NAME>Istanbul</NAME<
<CITYSUMMARY>The city of the sultans</CITYSUMMARY>
<CITYDESCRIPTION>Istanbul, gateway from west to east, has been the capital of three empires throughout its long, troubled history.<P>
</CITYDESCRIPTION>
</CITY> Ankara
<CITYSUMMARY>Heart of the new republic</CITYSUMMARY>
<CITYDESCRIPTION>The founders of the modern, secular Turkey chose Ankara as their new capital.
</CITYDESCRIPTION>
</CITY>
</COUNTRY>
</LOCATIONS>

The XML document describes structure. Locations are divided into countries and cities. And a city is made up of a one-line description and a longer description. Notice that the XML document mentions nothing about display - that is left to the style sheet. The style sheet author may decide, for the purposes of the on-line version, to print it thus: -

TURKEY Istanbul - The city of the sultans

Istanbul, gateway from west to east, has been the capital of three empires throughout its long, troubled history.

Ankara - Heart of the new republic

The founders of the modern, secular Turkey chose Ankara as their new capital.

An additional style sheet can be produced that allows the underlying XML document to be printed in a brochure. Or the user may decide to just browse the country and cities, with their short descriptions, ignoring the longer descriptions. And the document could also include hotel information, prices, availability, and so on. Each of these new fields would require a tag (not shown above) defined in the DTD and a matching style sheet specification to state how it should be displayed.

All of this power and flexibility is available because XML has split structure from display.

XML links

XML will greatly extend the hyperlinking mechanism familiar to all web surfers. Links, defined with the XLL (Extensible Link Language), can be stored and maintained independently of the documents in which they appear. They can be bidirectional, allowing greater power to link common data within structured documents. And they can have attributes that define what type of link it is. Some links will even make the link target document look like part of the link source document. All of this extra flexibility and power will transform the way documents are accessed and linked together across the Web.

The Document Object Model

The Document Object Model (DOM) is a standard object-based API that will allow, among other things, scripting languages dynamic access to XML document content. The content of tags will be available for complex processing on the client side.

XML applications

XML will be used wherever data needs to be structured for presentation or interchange, so it will greatly enhance e-commerce. XML is an excellent way to define nonproprietary data structures, thus allowing for smooth interchange of data between heterogeneous systems across the Web. For example, if all travel agents defined travel packages in a standard way, using the same DTD, they would be easily able to transfer data between themselves and airlines, hotels, and customers.


XML will greatly improve the searching capabilities of current web search tools.

XML will greatly improve the searching capabilities of current web search tools. Because documents will be defined and structured logically, with data separated from meta-data, searches can zoom in on the relevant tags. To take a simple example, a customer could search the Web for all travel agents offering two-week trips to Istanbul for less than $1500. The search tool could search for the identifying tags , , and in sites offering the standard travel agent DTD-compliant documents to find meaningful information.

Summary

XML will enhance the power of the Web considerably. It will leverage the vast store of information on the Web to improve searching and data interchange. And it will help overcome some of the limitations currently faced by HTML.

如果您希望与本文章的作者或其所在机构,进一步交流,请联系:畅享网 姜小姐
jill.jiang@amteam.org | 021-51096826-112 | 在线联系
吴勇毅 专栏CIO 应向刘邦学管理

而国内不少专家也认为,“七分管理,三分技术”,CIO优良与否,与技术出身有关,更与整体素质有关。

夏敬华的KM专栏[原创]智慧的和谐—知识管理推..

从知识管理的角度来观察执行力体系,我们会发现,知识管理和战略、运营和人员这三个环节之间有着内在紧密的逻辑联系。

KM八爪鱼-萧秋水的专栏[原创]企业知识库2.0

面对经济危机,企业更应该关注知识管理,关注知识库的构建,扩充知识储备,提高企业智商和竞争优势。

前沿论丛2009年第三期——知识管理..

国内中小企业普遍存在管理基础薄弱、规范化程度低、信息化基础差等方面的问题,而知识管理的实施难度甚至要高于ERP的实施,因为简单的从上而下压迫式的推行只能做到知识……