ISO/IEC JTC 1/SC34 N0395

 

ISO/IEC JTC 1/SC34

Information Technology —

Document Description and Processing Languages

Title:

Canonical XTM

Source:

Lars Marius Garshol, Steve Pepper, JTC1/SC34

Project:

ISO 13250

Project editor:

Steven R. Newcomb, Michel Biezunski, Martin Bryan

Status:

First committee draft

Action:

For review and comment

Date:

2003-04-04

Summary:

 

Distribution:

SC34 and Liaisons

Refer to:

 

Supercedes:

 

Reply to:

Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mail: mailto:mxm@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Mrs. Sara Desautels, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
25 West 43rd Street
New York, NY 10036
Tel: +1 212 642-4937
Fax: +1 212 840-2298
E-mail: sdesaute@ansi.org

Canonical XTM

A canonical serialization format for topic maps

First committee draft 04 04 2003

This version:

ISO/IEC JTC 1/SC34 N0395

Latest version:

Authors' working copy

Authors:

Lars Marius Garshol , Ontopia <larsga@ontopia.net>

Steve Pepper , Ontopia <pepper@ontopia.net>


This specification describes serialization rules and an output format for topic maps that conform to ISO 13250:200x Topic Maps: Standard Application Model. Its purpose is to enable the development of conformance test suites for topic map processors by ensuring that all processing defined therein has been performed correctly.

This document is intended to become part of the new ISO 13250 standard. For more information on this process see [tm-guide].

This is $Revision: 1.3 $.

Status of this Document

Table of Contents

1 Introduction
2 General serialization rules
3 General ordering rules
    3.1 String values
    3.2 Null values
    3.3 Comparing sets
4 Ordering and serialization
    4.1 The topic map item
        4.1.1 Serialization
    4.2 Topic items
        4.2.1 Ordering
        4.2.2 Serialization
    4.3 Topic name items
        4.3.1 Ordering
        4.3.2 Serialization
    4.4 Variant items
        4.4.1 Ordering
        4.4.2 Serialization
    4.5 Occurrence items
        4.5.1 Ordering
        4.5.2 Serialization
    4.6 Association items
        4.6.1 Ordering
        4.6.2 Serialization
    4.7 Association role items
        4.7.1 Ordering
        4.7.2 Serialization
    4.8 Locator items
        4.8.1 Ordering
        4.8.2 Serialization

Appendices

A Canonical XTM DTD
B References


1 Introduction

This specification describes serialization rules and an output format for topic maps that conform to ISO 13250:200x Topic Maps: Standard Application Model ([SAM]). Its purpose is to enable the development of conformance test suites for topic map processors by ensuring that all processing defined therein has been performed correctly.

Logically equivalent topic maps that are serialized in accordance with this specification have the exact same byte-by-byte representation and can thus be easily compared.

The goal of this specification is not to define rules that ensure a deterministic result for all possible conforming topic maps since to do so would require a level of complexity that would be prohibitive (and perhaps even impossible). The goal is rather to define rules that allow the deterministic serialization of a subset of all possible topic maps that is large enough to enable conformance testing of all aspects of ISO 13250:200x Topic Maps: Standard Application Model.

The output format described in this specification uses a syntax that is a subset of the XTM syntax specified in [XTM]. Topic maps serialized according to this specification can therefore easily be processed by any conforming topic map processor.

2 General serialization rules

Before serialization, the topic map must be processed in accordance with the requirements in ISO 13250:200x Topic Maps: Standard Application Model.

The output document must be a canonical XML document as defined in [xml-c14n]. In addition, a line feed (U+00A0) must be inserted after every end tag and likewise after every start tag of elements that have element content or are empty.

3 General ordering rules

This section describes general ordering rules. Ordering rules for sets of specific item types are described in the sections for the individual item types.

3.1 String values

String values are ordered in lexicographical order, based on UCS code point values.

3.2 Null values

Object properties with null value are considered to be ordered before properties with a value.

3.3 Comparing sets

Before they can be compared sets must be sorted using the specific ordering rules for the item types of which they are composed. They are then compared element by element, starting from the beginning of the set until either:

1.      a pair of elements is different, in which case the order of the sets is determined by the order of those elements; or

2.      one of the sets is exhausted before the other, in which case the set with the smaller number of elements is considered to be ordered before the one with the greater number of elements.

4 Ordering and serialization

4.1 The topic map item

4.1.1 Serialization

Issue (cxtm-namespace):

Is this conformant to the XML c14n with respect to namespaces?

Issue (cxtm-topicmap-id):

Should we always output an ID attribute, or only when the topic map is reified?

The topic map item is serialized as a <topicMap> element with the following attributes:

·         xmlns: "http://www.topicmaps.org/cxtm/1.0/"

·         xmlns:xlink: "http://www.w3.org/1999/xlink"

·         id: "tm"

The topic map is serialized by first serializing all topic items in the [topics] property, and then serializing all association items in the [associations] property.

The [base locator] is used to create relative URIs for locator items as described in section 4.8 Locator items.

The following properties are ignored:

·         [reifier] (redundant)

·         [source locators] (not used for conformance testing)

4.2 Topic items

4.2.1 Ordering

A set of topic items is ordered by comparing the following properties in the order given:

1.      [subject addresses]

2.      [subject identifiers]

3.      [topic names]

4.      [occurrences]

5.      [roles played]

In cases where the criteria given above are not sufficient to determine the order of two topic items a warning must be issued.

Note:

The criteria given above will not suffice when two topic items have no [subject addresses] or [subject identifiers] and also have an identical set of characteristics. If one or both of those topic items are referenced from another item, the results of canonicalization will not be deterministic.

Issue (cxtm-topic-roles-played-order):

The rules for ordering association role items in section 4.7 Association role items are not sufficient to make comparisons of [role played] properties in all cases.

4.2.2 Serialization

Each topic item is serialized as follows:

A <topic> element is output with its id attribute set to the value "tN", where N is the number of the topic item in order of serialization, starting with 1. The content of the <topic> element is constructed as follows:

·         If any of the topic item's [subject identifiers], [subject addresses] or [reified] properties have non-null values, a <subjectIdentity> element is output and its content is constructed as follows:

1.      If the [subject addresses] property is not the empty set, one <resourceRef> subelement is output for each locator item with an xlink:href attribute whose value is determined by the locator item.

2.      If the [subject identifiers] property is not the empty set, one <subjectIndicatorRef> subelement is output for each locator item with an xlink:href attribute whose value is determined by the locator item.

3.      If the [reified] property is not null, one <subjectIndicatorRef> subelement is output for the item ("A") that is the value of that property. The value of the <subjectIndicatorRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topicMap>, <baseName>, <variant>, <occurrence>, <association>, or <member> element to which item "A" gives rise.

·         Following this, the topic item's [topic names] and [occurrences] properties are serialized, in that order, in accordance with the rules for serializing topic name items and occurrence items.

The following properties are ignored:

·         [roles played] (redundant)

·         [source locators] (not used for conformance testing)

4.3 Topic name items

4.3.1 Ordering

A set of topic name items is ordered by comparing the following properties in the order given:

1.      [value]

2.      [variants]

3.      [type]

4.      [scope]

4.3.2 Serialization

Each topic name item is serialized as follows:

A <baseName> element is output. If and only if the value of the [reifier] property is not null, an id attribute is specified and given the value "bnN", where N is the value of a counter that starts at 1 and is incremented by 1 for each <baseName> element that is output with an id attribute. The content of the <baseName> element is constructed as follows:

·         If the [type] property is not null, an <instanceOf> subelement is output containing a <topicRef> subelement. The value of the <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that is the value of the [type] property.

·         If the [scope] property is not the empty set, a <scope> subelement is output containing one <topicRef> subelement for each topic information item in the value of the property. The value of each <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that gives rise to the <topicRef> element.

·         If the [variants] property is not the empty set, it is serialized in accordance with the rules for serializing variant items.

·         A <baseNameString> element is output whose content is the value of the [value] property.

The following property is ignored:

·         [source locators] (not used for conformance testing)

4.4 Variant items

4.4.1 Ordering

A set of variant items is ordered by comparing the following properties in the order given:

1.      [value]

2.      [resource]

3.      [scope]

4.4.2 Serialization

Each variant item is serialized as follows:

A <variant> element is output. If and only if the value of the [reifier] property is not null, an id attribute is specified and given the value "vN", where N is the value of a counter that starts at 1 and is incremented by 1 for each <variant> element that is output with an id attribute. The content of the <variant> element is constructed as follows:

·         A <parameters> subelement is output containing one <topicRef> subelement for each topic information item in the value of the [scope] property. The value of each <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that gives rise to the <topicRef> element.

·         A <variantName> element is output and its content is constructed as follows:

o        If the [value] property is not null, the element's content is a <resourceData> element whose content is the value of the [value] property.

o        If the [resource] property is not null, the element's content is a <resourceRef> element with an xlink:href attribute whose value is determined by the locator item that is the value of that property.

The following property is ignored:

·         [source locators] (not used for conformance testing)

4.5 Occurrence items

4.5.1 Ordering

A set of occurrence items is ordered by comparing the following properties in the order given:

1.      [value]

2.      [resource]

3.      [type]

4.      [scope]

4.5.2 Serialization

Each occurrence item is serialized as follows:

A <occurrence> element is output. If and only if the value of the [reifier] property is not null, an id attribute is specified and given the value "oN", where N is the value of a counter that starts at 1 and is incremented by 1 for each <occurrence> element that is output with an id attribute. The content of the <occurrence> element is constructed as follows:

·         If the [type] property is not null, an <instanceOf> subelement is output containing a <topicRef> subelement. The value of the <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that is the value of the [type] property.

·         If the [scope] property is not the empty set, a <scope> subelement is output containing one <topicRef> subelement for each topic information item in the value of the property. The value of each <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that gives rise to the <topicRef> element.

·         If the [value] property is not null, a <resourceData> element is output whose content is the value of the [value] property.

·         If the [resource] property is not null, a <resourceRef> element is output with an xlink:href attribute whose value is determined by the locator item that is the value of that property.

The following property is ignored:

·         [source locators] (not used for conformance testing)

4.6 Association items

4.6.1 Ordering

A set of association items is ordered by comparing the following properties in the order given:

1.      [type]

2.      [scope]

3.      [roles]

4.6.2 Serialization

Each association item is serialized as follows:

A <association> element is output. If and only if the value of the [reifier] property is not null, an id attribute is specified and given the value "aN", where N is the value of a counter that starts at 1 and is incremented by 1 for each <association> element that is output with an id attribute. The content of the <association> element is constructed as follows:

·         If the [type] property is not null, an <instanceOf> subelement is output containing a <topicRef> subelement. The value of the <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that is the value of the [type] property.

·         If the [scope] property is not the empty set, a <scope> subelement is output containing one <topicRef> subelement for each topic information item in the value of the property. The value of each <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that gives rise to the <topicRef> element.

·         The [roles] property is serialized in accordance with the rules for serializing association role items.

The following property is ignored:

·         [source locators] (not used for conformance testing)

4.7 Association role items

4.7.1 Ordering

A set of association role items is ordered by comparing the following properties in the order given:

1.      [type]

2.      [role playing topic]

4.7.2 Serialization

Each association role item is serialized as follows:

A <member> element is output. If and only if the value of the [reifier] property is not null, an id attribute is specified and given the value "arN", where N is the value of a counter that starts at 1 and is incremented by 1 for each <member> element that is output with an id attribute. The content of the <member> element is constructed as follows:

·         An <instanceOf> subelement is output containing a <topicRef> subelement. The value of the <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that is the value of the [type] property.

·         A <topicRef> subelement is output. The value of the <topicRef> element's xlink:href attribute is set to the concatenation of "#" and the value of the id attribute of the <topic> element created by the topic item that is the value of the [role playing topic] property.

The following property is ignored:

·         [source locators] (not used for conformance testing)

4.8 Locator items

4.8.1 Ordering

A set of locator items is ordered by comparing the following properties in the order given:

1.      [notation]

2.      [reference]

4.8.2 Serialization

Locator items in the values of [subject identifiers] properties of topic items give rise to <subjectIndicatorRef> elements.

Locator items in the values of [subject addresses] properties of topic items, or [resource] properties of variant or occurrence items, give rise to <resourceRef> elements.

Other locator items are ignored.

When a locator item is not ignored and its [notation] property has the value "URI", the xlink:href attribute of the corresponding element is set to a value determined by the locator item's [reference] property. That value must be a minimal URI relative to the [base locator] of the topic map item.

Note:

Relative URIs are required in order to remove dependencies on the source locations of the input topic map.

Ed. Note:
Do we need text to cover the escaping of URIs or is this handled by SAM?

When the notation of the locator is not "URI", the value of the corresponding element's xlink:href attribute is set to the concatenation of the [notation] property, ":", and the [reference] property.

Ed. Note:
Is this a satisfactory way of handling locators that are not URIs?

A Canonical XTM DTD

 
<!ELEMENT topicMap ( topic*, association* ) >
<!ATTLIST topicMap
   id              ID        #FIXED 'tm'
   xmlns           CDATA     #FIXED 'http://www.topicmaps.org/cxtm/1.0/'
   xmlns:xlink     CDATA     #FIXED 'http://www.w3.org/1999/xlink' >
 
<!ELEMENT topic ( subjectIdentity?, baseName*, occurrence* ) ) >
<!ATTLIST topic
   id              ID        #REQUIRED >
 
<!ELEMENT instanceOf  ( topicRef ) >
 
<!ELEMENT subjectIdentity ( resourceRef*, subjectIndicatorRef* ) >
 
<!ELEMENT topicRef  EMPTY >
<!ATTLIST topicRef
   xlink:href      CDATA     #REQUIRED >
 
<!ELEMENT subjectIndicatorRef  EMPTY >
<!ATTLIST subjectIndicatorRef
   xlink:href      CDATA     #REQUIRED >
 
<!ELEMENT baseName  ( instanceOf?, scope?, baseNameString, variant* ) >
<!ATTLIST baseName
   id              ID        #IMPLIED >
 
<!ELEMENT baseNameString  ( #PCDATA ) >
 
<!ELEMENT variant  ( parameters, variantName ) >
<!ATTLIST variant
   id              ID        #IMPLIED >
 
<!ELEMENT variantName  ( resourceRef | resourceData ) >
 
<!ELEMENT parameters  ( topicRef+ ) >
 
<!ELEMENT occurrence ( instanceOf?, scope?, ( resourceRef | resourceData ) ) >
<!ATTLIST occurrence
   id              ID        #IMPLIED >
 
<!ELEMENT resourceRef  EMPTY >
<!ATTLIST resourceRef
   xlink:href      CDATA     #REQUIRED >
 
<!ELEMENT resourceData  ( #PCDATA ) >
 
<!ELEMENT association ( instanceOf?, scope?, member+ ) >
<!ATTLIST association
   id              ID        #IMPLIED >
 
<!ELEMENT member ( roleSpec, topicRef ) ) >
<!ATTLIST member
   id              ID        #IMPLIED >
 
<!ELEMENT roleSpec  ( topicRef ) >
<!ATTLIST roleSpec
   id              ID        #IMPLIED >
 
<!ELEMENT scope  ( topicRef+ ) >

B References

xml-c14n

Canonical XML Version 1.0, John Boyer, Author/Editor. World Wide Web Consortium. 15 March 2001.

ISO13250

ISO/IEC 13250:2002 Topic Maps, ISO, Geneva, 2002.

SAM

The Standard Application Model for Topic Maps, SC34/WG3 draft, 2003.

tm-guide

Guide to the topic map standardization process, Lars Marius Garshol, 2002-06-23, ISO/IEC JTC1 SC34/N0323.

XTM

The XML Topic Maps (XTM) Syntax 1.1, SC34/WG3 draft, 2003.