ISO/IEC JTC 1/SC 34N0449

ISO/IEC logo

ISO/IEC JTC 1/SC 34

Information Technology --
Document Description and Processing Languages

TITLE: Topic Map Query Language, Use Cases
SOURCE: Mr. Lars Marius Garshol; Mr. Robert Barta
PROJECT: WD 18048: Information technology -- Topic Maps -- Topic Maps Query Language (TMQL)
PROJECT EDITOR: Mr. Robert Barta; Mr. Lars Marius Garshol
STATUS: Technical document
ACTION: For review and comment
DATE: 2003-11-07
DISTRIBUTION: SC34 and liaisons
REPLY TO:

Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Chairman)
Y-12 National Security Complex
Bldg. 9113, M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
Network: masonjd@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/
ftp://ftp.y12.doe.gov/pub/sgml/sc34/

Mr. G. Ken Holman
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Box 266,
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: jtc1sc34@scc.ca
http://www.jtc1sc34.org/



Topic Map Query Language, Use Cases

Technical document 2003-12-04

This version:
TMQL Use Cases (N0449)
Authors:
Lars Marius Garshol , Ontopia mailto:larsga@ontopia.net
Robert Barta , Bond University mailto:rho@bond.edu.au

Abstract

This document is part of the effort to define a standard query language for Topic Maps (TMQL, ISO18048) and as such describes possible use scenarios from both, an application oriented view, but also more detailed in terms of retrieved topic map content. These use cases are intended to support the evaluation process for individual TMQL proposals.

Table of Contents

1 Status of this Document
2 Introduction
3 Application Scenarios
    3.1 Knowledge Web Sites
    3.2 Web Portals
    3.3 Content Management (Horizontal and Vertical Markets)
    3.4 Enterprise Information Integration
    3.5 High-Integrity Systems
    3.6 Meta-Data Syndication
    3.7 Knowledge Management
    3.8 Intangible Asset Management
4 Application Use Cases
    4.1 Photo Meta-Data
    4.2 Visualisation of Topic Map Data
    4.3 Topic Map Introspection
5 Use Case: Literature References
    5.1 Problem Domain
        5.1.1 Test Data
        5.1.2 Assumptions
    5.2 Queries with List Output
        5.2.1 Conventions for the Sample Output
        5.2.2 Individual Queries
    5.3 Queries with XML Output
    5.4 Queries with Topic Map Output
6 Simple Retrievals

Appendices

A Acknowledgements
B References
C Change Log


1 Status of this Document

This document is an working draft as commissioned according to [N0423] to collect potential use cases and usage scenarios for a Topic Map Query Language [TMQL]. This document is for review for SC34WG3 members and they are invited to feedback corrections and also comment on the general structure and statement of the document. It should be seen in the context of [tmql-req].

The current document has been made as a best effort to incorporate the individual contributions and the comments made in the Montreal Meeting Report [N0439]. The editors seek confirmation that the comments and suggestions are addressed to the satisfaction of WG3.

There is no formal due date for comments, although it is expected that some editorial time is needed to prepare a new version for the next meeting in December 2003 (Philadephia).

Comments can be addressed directly to the authors, on the IRC channel #topicmaps@irc.freenode.net or via the official SC34 mailing list.

See the C Change Log for changes.

2 Introduction

Topic maps have a particular structure as defined by [tmdm] and more abstractedly by [RM]. In principle it would be possible - and it has been done - to model the content via a relational data model or - using [XTM] as one possible representation - storing it into an XML (enabled) data store.

For scalability reasons most TM storage solutions tend to build their own dedicated data models and access technologies. From this point on programming interfaces allow application developers to access part of the map, being for reading or writing. In this sense the abstract model defined by TMDM [TMDM] may play a similar role as [DOM] for XML based data, at least in the short term. For reasons of minimality, data models such as TMDM have to be limited in the way how particular TM content is identified in a map.

A Topic Map Query Language may cut down development costs for particular classes of applications. The language would allow to phrase more global queries viewing the topic map content as a whole and not just as a collection of topics and associations. Additionally such a language could also facilitate the generation of output in formats like XML directly in order to minimize the necessary interaction between the application and the query infrastructure.

The following application scenarios and use cases are designed to illustrate the use of a hyperthetical Topic Map query language. They all assume that TM content is stored in an appropriate backend and is retrieved using the query language in a similar way how [SQL] is used for relational data stores. It is also assumed that TMQL can generate output in form of list structures, tree-structured like XML or even complete topic maps.

We will first attempt a high-level characterisation of possible usage scenarios for Topic Map technology in general (section 3 Application Scenarios), but particularly focus on the deployment of a query language. We use these scenario descriptions to cover whole classes of applications. In section 4 Application Use Cases we will narrow down on a few particular applications and loosely define some sample queries which may be typical.

The following section (5 Use Case: Literature References) then singles out one particular application, that of managing literature references with Topic Maps. In a rather trivial, restricted scenario we define first a small test database which we use to define detailed queries with their expected outcomes.

To complete the whole range of use scenarios - from the rather high-level deployment options to concrete queries - the last section (6 Simple Retrievals) then lists rather generic, isolated queries. These focus mainly on specific Topic Map components and simply select parts of a map.

3 Application Scenarios

This section provides some high-level descriptions of possible usage scenarios for TMQL. While there may be similarities between them, these applications have been collected to demonstrate the range of TMQL applicability in general.

3.1 Knowledge Web Sites

Such web sites simply host content about a particular organisation, product or theme. Here we do not distinguish between intranet sites which more concentrate on staff information and services and those which address a wider audience. In case of an intranet the knowledge may constitute corporate memory, the agenda of public web site is usually more educational or promotional.

Many professional web sites nowadays master the content out of a content back-end, be it a relational database or some other data store. Which data store technology to choose largely depends on the nature of the content. As long as the content can be disciplined into relational data models, conventional relational databases offer sufficient management facilities to insert and retrieve content fragments. To allow a maximum of content reuse and management, most sites will also attach content management systems.

Once the content becomes more irregular, (native or hybrid) XML databases provide some leeway in terms of storage flexibility. Applications, though, have to interpret XML content to a certain extent to be able to display the relevant parts to a user.

Web sites offering, say, complex educational material do not fall into above categories; the content is neither tabular nor tree-like in nature. Instead, Topic Map back-ends may carry part of the relevant content.

Except for statically prepared web content, all these sites will have to use either one of the numerous scripting technologies or a web application infrastructure to pull content from the back-end and to regenerate it into a format which end user devices, such as browsers, can render.

In case of server-side technologies a query formulated in TMQL will be used to identify relevant content fragments within a topic map instance. The rest of the application will insert the content into templates; this end result will then be sent to the client (see appendix B for a list of detailed queries for content retrieval). In this context application servers so perform typical middle-ware tasks and host the application itself. Also in such scenarios a TMQL query engine can be used to select Topic Map content before it is prepared for output.

A special case are XML applications servers. According to their operational model it is not a program which carries the control-flow to approach the data back-ends at the appropriate times. Instead, XML documents are processed and interpreted. If special XML tags are detected by the XML application processor, then libraries - provided by the application engineer - will be invoked. These [taglibs] will create XML fragments which are inserted as part of the overall output. How much of the production of the XML structure is done by the application or can be done by the TMQL query processor may decide on the ease of use and the performance of the overall system.

In any case XML application servers can then deliver content which can be further transformed into HTML, SVG or other formats.

3.2 Web Portals

Web portals differ from web sites in that they cover a larger theme presented for different clientels. An example would be a health portal; in that doctors, nurses, hospital management, government, insurances, and, of course patients will have different views onto the focal theme.

The challenge for the portal maintainer is (a) to capture an sufficiently rich content for all the clients to keep them interested in the portal to such an extent that they also are motivated to contribute content. The portal (b) also has to syndicate the generated content between various user groups and third-parties.

On a knowledge engineering level every user group can be defined via the concepts they are mainly interested in. Patients might be mainly concerned about services, doctors about liabilities in particular situations, hospitals about cost management. The portal thus will (c) have to define ontologies for every such clientel.

All these ontologies, of course, will have to coexist in the sense that they are compatible with each other. Moreover, the portal maintainer will have to define mappings between content adhering to different ontologies. If content is generated by one user clientel then this content follows their prevailing ontology. To be useful for another user groups the content has to be mapped into a different ontology. A Topic Map query language might be of help to automate the task to create a collaborative knowledge base.

3.3 Content Management (Horizontal and Vertical Markets)

Classical document management focuses on the separation of content and layout, but also on managing considerable amounts of documents.

Topic Maps can be used to store document meta-data in a rather flexible way. This information usually will contain rather generic properties like authorship, creation and modification dates. The structure of these attributes have been in standardisation for some time [DC]. This enables software vendors to model them either in a conventional database or in a Topic Map infrastructure.

In many cases, though, rather organisation or business process specific information will have to be added. Governmental organisations will treat documents containing legal texts quite differently than a print publisher will manage customer PDF data or a marketing firm does with Quark Express templates.

Software vendors which create solutions for vertical markets may not only define ontologies to specify what kind of meta-data can be stored; they also will build applications which can answer specific queries such as "when was this legal text first discussed in the parliament?" or "which set of PDF documents was printed in-house with a particular machine in May?" or "in what marketing campaign was a particular Quark Express template not used?".

3.4 Enterprise Information Integration

Enterprise application integration is a term for efforts to connect various separate applications within an information environment. Depending on the prevailing definition enterprise information integration specifically tries to consolidate diverse data sources in to one coherent whole.

XML technologies were of tremendous help in this area and facilitated to reduce eventually the number of different applications and services corporations. Still, when it comes to integrate the back-end data sources a rather flexible data model is necessary. One option - of course - is to use the Topic Maps paradigm.

Unfortunately, in real-life situations it cannot be easily achieved to simply recombine all existing data into one database. This usually has political, commercial but also technical reasons. Instead, Topic Map technology can be used to provide this integration virtually by mapping the individual data resources transparently into a topic map framework. Every such resource then is represented by a /immaterialized topic map/, i.e. a map which only exists virtually for the applications which use the Topic Map access. Internally every access to the topic map is translated into an access to the back-end store, be it a relational database using SQL or a hierarchically organised LDAP database. A TMQL thus provides a homogenuous view to all applications above the Topic Map blanket. Various drivers for different back-ends will provide a transparent translation in both directions.

In a corporate scenario it is normally achievable that the individual immaterialized maps all use the same set of subject indicators. Given that, a standard topic map merge will result in a consolidated view. Otherwise, the merging has to be done according to application specific rules, as for example based on persons' email addresses. In this process a query language may provide the necessary means to generate a semantically consolidated map.

3.5 High-Integrity Systems

High-Integrity systems not only try to consolidate the information landscape of an organisation or a complex machinery; they also use business and mission aware knowledge systems to improve the human-machine interaction. Especially in mission-critical system involving considerable assets and/or human life this should facilitate that operators make as little mistakes as possible.

As such operators have to make decisions in different situations: tests during normal operation, controlled upgrades of sub-systems, be they hardware or software related and, of course, decisions in abnormal situations to bring the system to a normative state.

Topic Maps can play a role in such as system as link between the documentation management system, the process-aware knowledge base and the physical state of the system. Depending on the intent of the operator, the operational parameters of the application and the current system configuration, the user interface application would use a Topic Map query to retrieve the appropriate documentation.

As such the Topic Map is used as a sophisticated document index holding the knowledge which part of a document is relevant to the given context and which part of the knowledge base is to be consulted.

3.6 Meta-Data Syndication

A content syndicate plays the man-in-the-middle between content producers and consumers. Its business model is to consolidate the incoming (upstream) content technically and content-wise and to deliver content downstream to portals or individuals.

If the content is representable in XML then the whole arsenal of XML processing and transformation techniques can be used to achieve the above goal. For meta-data Topic Maps can be used, but these need a separate set of manipulation technologies.

To illustrate this, assume that a content provider observes the global market of watches. In a contractual relationship with the syndicate he reports on new watch models and price changes of models on the market. Every now and then, new content following the above ontology will enter the syndicate.

Before the syndicate will send this information to its subscribers it may first consider of adding value to this information: Not everyone subscribed will be an expert in this area, so background information about the company producing the model can be merged into the content. This can be done using a query against a knowledge base which has been acquired background information about the watch market. The query would extract company information and would merge this query result into the original message.

The syndicate may also reduce content if it is not deemed relevant to particular subscribers. End consumers might probably not be interested in details about the manufacturing process of that watch model. On the other hand this might be important for a merchant as this may influence shipping decisions. Again, a query can be used to achieve this kind of filtering: for every user class - maybe even for every user - there exists such a query. The content pushed against this query will cause the query processor to only pass through information according to the query.

Finally, the content will have to be brought into a form where it can be delivered downstream. In case of end users this might be text in email or SMS messages, or also HTML for web pages to be picked up by an private subscriber. Here the query language may be directly used to create documents in these formats. A query will simply walk over the message content and will output SMS, XML or HTML content.

For a portal subscriber the topic map might be serialised as it is and shipped as an XTM document. Depending on the agreed service level, though, either the sender or the receiving end will have to make adaptions to the map so that it can be used within the portal. This amounts to a processing step converting a map following one ontology into one adhering another ontology. Depending on the complexity of such a transformation a query language may accomplish some of the tasks herein.

3.7 Knowledge Management

Knowledge is more than content. With content it is commonly assumed that the interpretation thereof is mainly in the eye of the beholder, mostly a human. The emerging knowledge management market is based on the mantra that at least some part of the interpretation can be left to a machine (MUC, machine understandable content) [lassila02].

Let us assume a Topic Map based database containing information about animals. Looking for lions will hopefully return everything the database knows about this particular concept: names in other languages, references to web sites covering lions in more detail. The topic also will hopefully be be connected via associations to other concepts.

More specific questions like "when are lions dangerous?" are much more difficult to handle than querying for a topic head. The query processor will have to assume that the query should run "when are lions dangerous to humans?". Then it would have to consult the topic map to find possible connections between the concept of danger, human and lion. If there is no direct relationship between these, then a query processor could use the information that "lions eat mammals when hungry". If they can further defer that a human is a mammal then they might give a reasonable answer.

A query language may support or at least facilitate queries like this if the engine is capable of inferencing, i.e. of making logical (or even heuristic) conclusions about stored facts.

The technical challenge, though, is that most useful application domains have an infinite or, at least a very high number of possible deductions. As such it is not feasible to store all possible consequences of all the given facts in a database. Consequently, the database will have to contain the given facts and the axioms. The query processor will then have to be capable to compute a deduction on request.

3.8 Intangible Asset Management

Businesses nowadays have a complex asset structure which they sometimes have to communicate to business partners, potential investors or - partially - to their customers. One technical option they have is to represent this knowledge about their assets in a topic map. Such map could contain use the organisational structure as backbone and could include a - more or less - high-level description of the products and services the company offers.

The (intangible) assets can be manifold: they can be patents associated with the key personel, the products, markets and courts the patent is or is planned to be leveraged. Assets also are the expertise of employees and business partners. Assets also can be further ideas which may be associated with people, markets and products. And assets also can include documentation about business processes and knowledge about business partners.

A Topic Map query language may help to analyze a current situation or make statements about possible scenarios. One example is a query which could ask which employees or associated business partners have an expertise in a particular area or market. In practical situations this will not be a precise query as the field of expertise is huge.

If we assume that a proposed project is described via a topic map, then a query might try to figure out which parts of the projects can be covered in-house and which other parts will have to be outsourced as there is no local expertise.

4 Application Use Cases

In this section we single out particular applications. This allows us to be more specific about how a TM query language can support the application engineer. Again, the applications described here have been chosen for variety > and not for any commonalities they might share.

4.1 Photo Meta-Data

The photo meta-data database has become a canonical example in the meta data community and has already been prototypically implemented as [RDFWeb]. It serves as an experimental test case for the use of semantic web/ technologies.

The database contains information about depictions (photos), such as whether a particular one depicts a person or a landscape. Other meta information may be the date of creation or the person who actually made the picture.

An ontology for this application would maybe contain different classes of pictures and archival data about individual pictures such as sizes or the resolution. To capture the contents of pictures we will make heavy use of associations of type 'depicts' which connects a particular picture with the theme it depicts. To keep things manageable, we restrict ourselves to people, locations and capture everything else with a class 'thing' where we simply add keywords or free text for a description. Adopting [WordNet] as a namespace we might add URNs for those subjects where WordNet holds an entry.

A typical query would ask for all persons who were sitting in a bar. The TMQL query would have to use the above 'depicts' association type and would have to identify these pictures, which have both such a relationship, to a person and a thing which is a bar. The result of this query simply is a table which can be readily processed further.

Another query might try to find a thumbnail picture of a particular landscape. As it is unlikely that a picture with arbitrarily given dimensions will exist in the database, a best match will have to be found.

The easiest way to achieve this, is to retrieve all images and sort them according to the dimension (either the length, the height or both). A cutoff condition takes care that only images below a particular size are returned.

A third, more sophisticated query might involve not conditions on single pictures, but on the database on the whole. Given particular persons Q0 and Qn we could ask which images either depict both, or otherwise which pictures contain other persons Q1, Q2, ... such that Q0 and Q1 are on one picture, Q1 and Q2 are, so are Q2 and Q3 and finally Qn-1 and Qn. Such chains of depiction could indicate so connections between individuals.

4.2 Visualisation of Topic Map Data

Visualisation of Topic Map based data will be a much sought-after feature for applications. Due to the limited space on a screen the visualising process must filter away information from potentially rich topic maps. Which information to show to which user in which context may also depend on the application: Information not shown to a user may be perceived as non-existing or even be misinterpreted as wrong.

Applications therefore will need a rather flexible way to create a number of these filter. The application developer should not be forced to rewrite the extraction of data from a topic map over and over again. Instead they will build a query statement according to the current scenario. The query statement will include not only a way to identify the relevant parts within a map, but also will return only the requested information. Such will also help to reduce clutter and traffic if such an exchange occurs over a network.

As an example we consider a topic map containing information about albums, tracks, artists and such. If we were to convert this information into [SVG] we might adopt either a generic or an application specific approach. According to the former we would provide a set of queries which would simply accept a - given but variable - topic as focal node and which would extract all topics connected via associations. These other topics would then be placed around the focal node.

Another alternative would be to make use of the knowledge about the particular application domain. In that we define scenarios; they would define not only the (exact or approximate) position of nodes on the screen but also what kind of information they should carry and potentially also how they are rendered eventually. In our music example one such scenario could choose the 'album' - whatever it might be - to be the focal node. It would be shown as a rectangle containing the album head, maybe propped up with a small image of a disc. The artist always would be placed left with the name. If that is a group the icons would indicate such and would provide a pop-up function to look at the names of the group members. For individual artist, the artist's name directly appear right to the album, and so on.

Scenarios can so tightly control the amount of information for a particular class of users. Queries would be used to pull only this information from the map which is necessary to populate a given scenario. In our case this would be the basenames of the album, the artist(s) and that of other nodes in the scenario.

4.3 Topic Map Introspection

Whenever an application has to cope with a topic map with an unknown or unclear ontology, a TMQL may help to discover the structure and form of the map. This applies to Topic Map analysis tools for reengineering existing maps, to generic user interfaces using Topic Map data but also optimizing Topic Map data stores.

All of them have to detect structural aspects of a map, such as which topics are used as classes in class-instance associations for topics, which are used as types of occurrences and which are association types. Other queries might try to detect the type hierarchy (taxonomy) which is used in a given map, others may want to analyse whether the set of typing topics and the set of scoping topics is disjunct.

More sophisticated queries might try to figure which association types are used in associations between two (or more) particular topics. So, for instance, we might ask which kind of direct relationships exist between a topic 'Osama-Bin-Laden' and 'Twin-Towers'. This may be then used for Topic Map quality assurance. Similar queries would ask more general what kind of relationships exists between particular types of topics. To continue our zeitgeist example, we could ask for all association types which exist between any topic of type 'terrorist' and 'terror-attack'.

More topological queries could target any topic which is not used at all or may try to find a chain of associations between a set of given topics.

Part of the analysis will not only be qualitative but also quantitative: Statistics about the use of topics and the degree of connectivity of a map may facilitate quality management on a higher level. Queries would compute the number of topics, associations, and unconnected topic map parts.

5 Use Case: Literature References

5.1 Problem Domain

[BibTeX] is a system to manage literature references as authors need when they reference other documents, usually within LaTeX documents. For this, BibTeX maintains the concept of a citation database, a text file which carries information about books, articles or other predefined classes of documents. Every such class has a set of mandatory attributes and also optional ones; a book might have an author or an editor, an article must have a header and one or more authors.

While convenient to use, the data model (and thus the underlying ontology) is biased to an individual user. Other uses were not directly anticipated. If, in contrast, an organisation stores literature information in a more flexible model - like one that Topic Maps provide - then this model might be rather oblivious of a particular use case. Authorship and editorship would of course be modelled as associations between the document and the author topic. So would publications as they connect documents with publishers.

In the following we detail particular queries against an assumed literature reference database described next.

5.1.1 Test Data

Our test database contains information about literature references.

Ed. Note:
This test map is available in XTM and in AsTMa= format. The XTM version is generated from the AsTMa= version.

While the test data has a realistic background, it is not supposed to reflect reality.

For reference, the following illustrations give an overview over the data and the associations used.

5.1.2 Assumptions

We will use the following assumptions and conventions:

  1. There exists a topic with an identifier unconstrained-scope which represents the unconstrained scope.

  2. It is assumed that subclasses represents http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-subclass, superclass represents http://www.topicmaps.org/xtm/1.0/core.xtm#superclass and subclass represents http://www.topicmaps.org/xtm/1.0/core.xtm#subclass.

    This association type should have the implicit property that if a is an instance of A and A is a subclass of B then a is also an instance of B. The binary relation subclasses is to be interpreted non-reflexive and transitive.

  3. Furthermore we assume that de and en are both representations of the corresponding languages german and english.

  4. We refer to the basename value of a person topic as the persons's name; the basename value in the scope sort we call the sort name. The person's email address is the value of an occurrence of type email. In a similar way we refer to the basename of a document topic as the title of the document. The publication date, language, abstract and the download URL are all occurrence values of of occurrence items for a document topics, typed accordingly.

  5. An author is simply a person which has a is-author-of association with a topic of class document. Conversely, a publication is a document which has been authored by someone.

5.2 Queries with List Output

This category of queries returns lists as results. In this respect the language will behave like SQL in that it can also return (sometimes ordered) lists of tuples. The tuples themselves may contain items such as simple strings, text generated from the topic maps, topic ids, numbers but also information items as defined by [TMDM].

It is left open how a language framework covers the concept of list output. A framework may use a dedicated list type or may even map this into a different data model, such as that provided by XML DOM or Topic Maps themselves.

5.2.1 Conventions for the Sample Output

For every query we list the expected results. The order of the sample output is irrelevant unless a specific ordering is explicitely requested in the query.

In the case that strings are to be returned as part of a tuple, we will directly list them in the sample outputs below whereby we do not distinguish between string and numerical data.

We cannot denote values in the case TMDM information items are to be returned; here we will use an ad-hoc syntax: [ some-topic-id ] represents the topic item of the topic addressed by the topic identifier some-topic-id, [ "basename string", some-scope ] stands for a base name item with the given string and scope (the other properties remain undefined here), [ occurrence-type, "occurrence value", occurrence-scope ] accordingly for an occurrence item with the given type, value and scope (other properties remain undefined).

Ed. Note:
CXTM would be a candidate.

We also make use of topic identifiers to identify topics and to characterize the output. Implementations are supposed to provide a unique identification of topics, be it only via subject identifiers or via topic identifiers. The latters may be only valid locally to a map and/or local to a session.

5.2.2 Individual Queries

Some queries may not be directly implementable by a proposal. For these, the use case should elicit suggestions how the language intends to use extensions to solve this problem.

Ed. Note:

In the following sample queries the choice when to return Topic Map content in form of Topic Map information items or when in form of text strings is rather arbitrary. Language developer should consider both options for all the following queries. For reasons of presentation most queries use text output, though, and only some explicitely ask for information items.

  1. Retrieve all author names, i.e. the name of a topic which plays the role author in an is-author-of association.

    "Holger Rath"
    "Michel Biezunski"
    "Steve Pepper"
    "Steve Newcomb"
    		  

  2. Retrieve the name of all authors, this time ordered by the sort name provided in person topics:

    "Michel Biezunski"
    "Steve Newcomb"
    "Steve Pepper"
    "Holger Rath"
    		

  3. Retrieve the titles of all tutorials. The titles should be preferably those in scope en, de or in the unconstrained scope, in that order.

    "Making topic maps more colourful"
    "Euler, Topic Maps und Revolution"
    		

  4. Retrieve the names of all persons who have not authored anything.

    "John Smith"
    		

  5. Retrieve a list of all author names together with the title of their publications.

    "Holger Rath", "Making topic maps more colourful"
    "Steve Pepper", "Euler, topic maps, and revolution"
    "Steve Pepper", "Navigating haystacks and discovering needles"
    "Steve Pepper", "The TAO of Topic Maps"
    "Steve Newcomb", "XML topic maps: finding aids for the Web"
    "Michel Biezunski", "XML topic maps: finding aids for the Web"
    		

  6. Retrieve a list of all titles of documents which are tutorials (i.e. are a direct or indirect instance of class tutorial) sorted by publication date, descending.

    "Making topic maps more colourful", 2000
    "Euler, topic maps, and revolution", 1999
    		

  7. Retrieve a list of documents, sorted by publication date (ascending), only number 3 to 5, inclusive, together with this order number.

    3,"The TAO of Topic Maps", 2000
    4,"Making topic maps more colourful", 2000
    5,"XML topic maps: finding aids for the Web", 2001
    		

  8. Retrieve all topic identifiers of documents which have a download URL.

    pepper99a
    pepp00
    d-topicmaps-color
    		

  9. Retrieve a list of author's email addresses where the author has authored documents for which no download URL exists. Include the topic identifiers of these documents.

    pepper@n0spam.ontopia.net, pepper99b
    srn@n0spam.coolheads.com, bienew01
    mb@n0spam.coolheads.com, bienew01
    		

  10. Retrieve a list of author names where the author has written more than 1 document.

    "Steve Pepper"
    		

  11. A list of author names where the author has not written a single document with someone else.

    "Steve Pepper"
    "Holger Rath"
    		

  12. Retrieve all topic identifiers of documents and their URLs which have a non-working URL at query time.

    pepp00, http://www.broken.example.com/
    		

  13. Retrieve all topic identifiers for documents for which the abstract in english (i.e. the occurrence of type abstract in scope en) contains the phrase 'topic map' or 'topic maps', case-insensitive.

    pepp00
    bienew01
    		

  14. Retrieve all documents which have a title in german (i.e. a basename in the scope de).

    [ empty list ]
    		

  15. Retrieve all topic identifiers of the pairs of documents which share at least one word in the title, ignoring stopwords like 'and', 'of', 'the'. No duplicates in this list are allowed.

    pepper99a, pepp00
    pepper99a, d-topicmaps-color
    pepper99a, bienew01
    pepp00, d-topicmaps-color
    pepp00, bienew01
    d-topicmaps-color, bienew01
    		

  16. Retrieve the identifiers of all topics which represent information resources on the ontopia.net server(s).

    pepper99a
    pepp00
    	      

  17. Retrieve the topic identifiers of all documents which are not written in the language English (or where there is no information about that), i.e. where there is no occurrence of type language.

    pepper99b
    pepp00
    d-topicmaps-color
    	      

  18. Retrieve the topic identifiers of all documents which are directly or indirectly influenced by something which Steve Pepper wrote (i.e. interpreting "influenced by" as non-reflexive, but transitive relation). Copy the topic identifiers of the influential papers to the output.

    pepper99b,		'influenced by', pepper99a
    d-topicmaps-color,	'influenced by', pepper99b
    d-topicmaps-color,	'influenced by', pepper99a
    bienew01,		'influenced by', pepper99b
    bienew01,		'influenced by', pepper99a
    		

  19. Retrieve all identifiers of topics which are either (direct or indirect) instances of paper or tutorial. Duplicates should be suppressed.

    pepper99a
    pepper99b
    d-topicmaps-color
    bienew01
    		

  20. Retrieve all titles of all publications (regardless their scope) together with their most specific class, i.e. that class where this publication is directly (and not indirectly) an instance of.

    Euler, topic maps, and revolution, conference-paper
    Euler, topic maps, and revolution, tutorial
    Euler, Topic Maps und Revolution, conference-paper
    Euler, Topic Maps und Revolution, tutorial
    Navigating haystacks and discovering needles, journal-paper
    The TAO of Topic Maps, article
    Making topic maps more colourful, tutorial
    XML topic maps: finding aids for the Web, journal-paper
    		

  21. Retrieve all authors (i.e. all topic items which play the role author in an is-author-of association).

    [ holger-rath ]
    [ michel-biezunski ]
    [ steve-pepper ]
    [ steve-newcomb ]

  22. Retrieve all basename items of topics which play the role author in an is-author-of association).

    [ "Holger Rath", unconstrained-scope ]
    [ "Michel Biezunski", unconstrained-scope ]
    [ "Steve Pepper", unconstrained-scope ]
    [ "Steve Newcomb", unconstrained-scope ]

  23. Retrieve a list of all occurrence items being of type email.

    [ email, srn@n0spam.coolheads.com, unconstrained-scope ]
    [ email, mb@n0spam.coolheads.com, unconstrained-scope ]
    [ email, pepper@n0spam.ontopia.net, unconstrained-scope ]
    [ email, holger.rath@n0spam.empolis.com, unconstrained-scope ]
    [ email, j.smith@example.com, unconstrained-scope ]

5.3 Queries with XML Output

This section contains queries which result in XML instances which are handed to the application. These instances are either complete XML documents or may also be fragments of XML data.

To characterize the result we provide the (serialized) XML response. Unless specified explicitely, any number of white-spaces can appear within elements with no text content. For clarification we also append the DTD of these responses. Individual proposals may or may not use them as additional input to a query.

Unless specified explicitely, no particular ordering is prescribed.

  1. Return all titles of papers (conference papers, journal papers) together with the author names. The use of white-spaces should be exactly as depicted.

    <papers>
       <paper>
          <head>Making topic maps more colourful</head>
          <author>Holger Rath</author>
       </paper>
       <paper>
          <head>Euler, topic maps, and revolution</head>
          <author>Steve Pepper</author>
       </paper>
       <paper>
          <head>Navigating haystacks and discovering needles</head>
          <author>Steve Pepper</author>
       </paper>
       <paper>
          <head>The TAO of Topic Maps</head>
          <author>Steve Pepper</author>
       </paper>
       <paper>
          <head>XML topic maps: finding aids for the Web</head>
          <author>Steve Newcomb</author>
          <author>Michel Biezunski</author>
       </paper>
    </papers>
    
    whereby the XML document should adhere to the following DTD:
    <!ELEMENT papers (paper*)>
    <!ELEMENT paper  (head, author*)>
    <!ELEMENT head   (#PCDATA)>
    <!ELEMENT author (#PCDATA)>
    		

  2. Return the name(s) of all authors who have authored more than one paper.

    The result should look like this:

    <recidivists>
       <author>Steve Pepper</author>
    </recidivists>
    
    whereby the XML document should adhere to the following DTD:
    <!ELEMENT recidivists (author*)>
    <!ELEMENT author      (#PCDATA)>
    

  3. Select only the tutorial documents from the topic map and return an RDF document containing the title and the author information.

    The result should look like this:

    <rdf:RDF xmlns:rdf="http://www.w3c.org/RDF/"
             xmlns:lit="http://www.isotopicmaps.org/tmql/use-cases/literature/">
    
       <lit:tutorial rdf:about="http://www.ontopia.net/topicmaps/materials/euler.pdf">
          <lit:title>Euler, topic maps, and revolution</lit:title>
          <lit:author>Steve Pepper</lit:author>
       </lit:tutorial>
    
       <lit:tutorial rdf:about="http://www.gca.org/papers/xmleurope2000/papers/s29-01.html">
          <lit:title>Making topic maps more colourful</lit:title>
          <lit:author>Holger Rath</lit:author>
       </lit:tutorial>
    
    </rdf:RDF>
    

  4. Return all persons with their name and a bibliography as nested elements. The bibliography should contain a list of publications with title and language - if available - in brackets (if unavailable, the text "unknown" should be inserted). This list should be sorted by the publication date in descending order.

    All person elements must contain an ID attribute holding the respective topic identifier of the person's topic.

    The result should look like this:

    <persons>
        <person id="john-smith">
             <name>Smith, John </name>
             <bibliography>
             </bibliography>
        </person>
        <person id="holger-rath">
             <name>Rath, Holger</name>
             <bibliography>
                  <publication>Making topic maps more colourful (unknown)</publication>
             </bibliography>
        </person>
        <person id="steve-pepper">
             <name>Pepper, Steve</name>
             <bibliography>
                  <publication>The TAO of Topic Maps (unknown)</publication>
                  <publication>Navigating haystacks and discovering needles (unknown)</publication>
                  <publication>Euler, topic maps, and revolution (english)</publication>
             </bibliography>
         </person>
         <person id="michel-biezunski">
             <name>Biezunski, Michel</name>
             <bibliography>
                  <publication>XML topic maps: finding aids for the Web (english)</publication>
             </bibliography>
         </person>
         <person id="steve-newcomb">
             <name>Newcomb, Steve</name>
             <bibliography>
                  <publication>XML topic maps: finding aids for the Web (english)</publication>
             </bibliography>
         </person>
    </persons>
    		  
    whereby the XML document should adhere to the following DTD:
    <!ELEMENT persons      (person*)>
    <!ELEMENT person       (name bibliography)>
    <!ATTLIST person       id ID #REQUIRED>
    <!ELEMENT name         (#PCDATA)>
    <!ELEMENT bibliography (publication*)>
    <!ELEMENT publication  (#PCDATA)>
    <!ATTLIST publication  id ID #REQUIRED>
    

5.4 Queries with Topic Map Output

This section contains queries which return an instance of a topic map. It is left open whether the map should be represented only textually (in some Topic Map notation) or is represented in a TMDM-conformant manner. It is also left open whether the map is in itself complete in the sense that all topics used are actually present in the map.

  1. Create a topic according to the following specification:

    Select only journal papers.

    Output these together with the relevant author associations as topic map.

    There is no need to include the author topics as well.

    Replace all occurrences of type publication-date with associations of type was-published-in. For the daterole the topic players should use ids of the form x-dates-yyyy.

    The returned result should be equivalent with a deserialized form of the following map:

    Ed. Note:
    This map is available in XTM and in AsTMa= format. The XTM version is derived automatically from the AsTMa= version.

  2. Create a topic map according to the following specification:

    Select only documents which have a cite code.

    Select all these documents and reclassify them: all conference and journal papers should become an instance of publication. All other documents should be of class no-publication.

    Automatically add for every such document an association of type is-published-by where the document plays the role document and the topic prints-are-us plays the publisher. If the document has a publication date, then an additional role date should be added. The topic playing this role should represent the date via a URI of the form urn:x-date:yyyy.

    The returned result should be equivalent with a deserialized form of the following map:

    Ed. Note:
    This map is available in XTM and in AsTMa= format. The XTM version is derived automatically from the AsTMa= version.

6 Simple Retrievals

Apart from the queries in the previous section which address information on a topic map level, many applications will need to retrieve directly Topic Map information items as described in [TMDM].

It is to be expected that these queries will be rather frequent, especially in a web environment. For the language designer this could mean that it should be simple to express these queries; for the implementer a large amount of these queries may put some stress on any Topic Map store.

The following queries have been collected from [schmidt01], [Wrightson00] and [Ksiezyk00].

Ed. Note:
These queries and their similarity relationship is also available in XTM and in AsTMa= format. The XTM version is generated from the AsTMa= version.

  1. return all associations scoped to S where one of [T1,...,Tn] plays a role
  2. return all associations scoped to S where one of [T1,...,Tn] plays a role and one out of [R1,..,Rn] is a role
  3. return all topics with a specific resource locator L
  4. return the occurrences of the topics [T1,...,Tn] scoped by S
  5. return the occurrences of types [Y1,..,Yn] (ORed) of the topics [T1,...,Tn] scoped by S
  6. return all types of topic T
  7. return all indirect types of topic T, i.e. all being no direct type
  8. return the number of topics
  9. return all topics which's occurrences (resourceRef) matches a pattern O
  10. return all topics which's occurrences are scope with S
  11. return the number of topics which are used as types
  12. return all topics which with a specific subject locator
  13. return all topics which play one of the roles in associations of type A
  14. return all topics which play one of the roles in associations of type A, indirect
  15. return all topics which play one of the roles in associations of type A and scoped to S
  16. return all characteristics of topic T
  17. return all basenames of topic T
  18. return all topics which have baseNames in scope S
  19. return all topics which have baseNames B, exact match on B
  20. return all associations where T IS a role
  21. return all topics which have baseNames according to pattern B, regexp
  22. return all information of the topic T and all associations in which this topic in involved
  23. return all topics which have a specific baseName pattern B for a scope S
  24. return all topics of type Y
  25. return only basenames for T of a specific scope S
  26. return all topics of type X where X is of type Y
  27. return all topics which play one of a given roles [R1,..,Rn] in an assoc of type A (direct)
  28. return all roles which topic T plays in some association
  29. return all topics of type Y, direct or indirect
  30. return all associations of type A, scoped S, in which one of the topics [T1,..,Tn] plays a role [R1,..,Rn]
  31. return all roles in associations where T plays the ONLY role
  32. return all topics of type Y, direct or indirect, including Y
  33. return all associations of type A
  34. return the basenames of the topics [T1,...,Tn]
  35. return all associations of type A, scoped S, in which EXACTLY one of the topics [T1,..,Tn] plays a role [R1,..,Rn]
  36. return the subject indicators of the topics [T1,...,Tn]
  37. return the topic having a resource indicator I
  38. return the addressable subjects of resource of topics [T1,...,Tn]
  39. return all associations of types [A1,...,An] (OR) in which topics [R1,...,Rn] play roles
  40. return the basenames of the topics [T1,...,Tn] scoped by S
  41. return all associations of scope S

A Acknowledgements

In this draft the editors tried to consolidate considerable work done before in this area by others, specifically on the TMQL mailing list. The use cases were collected from various sites, among them the Ontopia site and the Techquila sites. Special thanks to Ann Wrightson and Erica Santosaputri for contributing use cases.

B References

Ed. Note:
The references are (of course) available in XTM and in AsTMa= format. The XTM version is generated from the AsTMa= version.

C Change Log