|
Document Warehousing & Content Management: InStranet Brings
the Unstructured to BIBy Dan Sullivan
In previous columns we've looked at similarities between structured and
unstructured data in business intelligence (BI), and I've argued the two
are not all that different. This month, I'll hold off on the architectural
and design discussions and instead look at an example implementation and
one of the first commercial products for BI-oriented content management.
InStranet brings one of the pillars of business intelligence,
multidimensional modeling, to bear on the problem of integrating content
with traditional BI applications. The basic idea behind InStranet's
flagship product, InStranet V2, is that documents can be organized,
managed and disseminated using a multidimensional model of meta data. The
parallels with structured BI are clear. Just as we slice and dice numbers
along fixed dimensions, users want to find content based upon dimensions
such as customer, partner, time and product type. Similarly,
administrators can use the same dimensions to control access based upon
line of business, client responsibilities and authority levels. This idea
of applying multidimensional modeling to content management, combined with
scalable implementations built on relational database platforms such as
Oracle and DB2, is opening new ways of thinking about documents, business
intelligence and customer relationship management (CRM).
One InStranet user, a major insurance company, had successfully
deployed a widely used and effective reporting system based upon their
data warehouse and Business Objects. The problem was they still were not
meeting all of their users' requirements. In an industry such as insurance
where 80 percent of the information exchanged between the insurer and
customers is unstructured, providing only structured information leaves
users fending for themselves to piece together the whole decision support
picture. Customers, agents and even internal staff needed access to
policies, contracts and claims as well as the structured data provided by
the data warehouse. The insurer seized the opportunity to improve customer
retention by offering CRM-like functionality along with the data warehouse
reporting. Within an Enterprise Information Exchange (InStranet's term for
a business application) users can examine claim statuses, review policy
details and exchange information with each other from a single point. As
this insurer found, facing the problem of integrating unstructured texts
with traditional business intelligence applications can directly impact
the bottom line. In this case, the benefit came in the form of improved
customer retention.
There are several steps to creating an enterprise information exchange.
First, dimensions are defined for organizing content. In some cases, these
can be used directly from a data warehouse or OLAP application. In other
cases, dimensions will be created specifically for the enterprise
information exchange (e.g., access control information). Once the
dimensions are defined, documents are tagged with XML-based meta data. The
meta data reflects where the document falls within the organizational
hierarchy, the document type, author, audience and other administrative
information. At this point, automatic categorization and related feature
extraction tools are not available. Administering the exchange includes
defining user profiles. This is not strictly necessary, but
personalization is one of the key benefits of the system. The final step
is creating links to other business intelligence, LDAP and related
applications. Since the core product is J2EE-compliant and XML- oriented,
integration barriers are minimized.
InStranet V2 falls within the scope of the broad enterprise information
portal market; but to put it into a more precise perspective, we'll use
IDC's model of the enterprise portal evolution, which is divided into
three waves. In the first wave, portals were fundamentally user interface
integration tools. In the second wave, the focus is on equal but
nonintegrated access to both structured and unstructured data. In the
third wave, structured and unstructured data access is unified. Achieving
this level of integration requires shared meta data, and dimensional
models are particularly good representations for this. (See my July 2001
DM Review article for more detail.) Built from the ground up to
support structured integration based upon a dimensional meta data model,
InStranet V2 is clearly in the third wave class of tools.
Who will benefit from applying multidimensional techniques to content
management? You are definitely a candidate if you need to exchange
documents with large numbers of customers, suppliers and partners. Large
organizations with many business units will also benefit. This is
especially true when multiple points within an organization service large
customers. For example, does an umbrella contract created in a major
accounts sales department dictate terms for services provided to that
customer's subsidiaries that are, in turn, serviced by a regional sales
force? Are those contracts on a file server? Are they distributed to the
different sales groups via e-mail and FTP? If this sounds familiar, then a
more structured approach could be in order. Finally, if you find yourself
chasing documents to explain anomalous trends and figures in a data
warehouse report, then it's time to consider document integration with
your other BI applications.
Dan Sullivan is chief technology officer at Redmont Corporation
specializing in the development of portal, content management and business
intelligence applications. He is the author of Document Warehousing and
Text Mining (Wiley, 2001). Sullivan may be reached via e-mail at DSullivan@Redmontcorp.com.
|