Submission from the UK of an initial Working Draft for the Proposed DSDL standard that identifies users requirements for the proposed standard in Annex 1.  This document is submitted as originally supplied and although the User Requirements are contained in an annex which is marked as normative, the UK does not consider that these requirements, which are instructions to the Project Editor, should remain as normative requirements on the users of the published standard.  SC 34 may like to consider whether these requirements should be contained in a separate User Requirements document that could form definitive instructions to the editor.


ISO/IEC JTC 1/SC34 N264

 

ISO/IEC JTC 1/SC34

Information Technology --

Document Description and Processing Languages

 

TITLE:

 

U.K. National Body Contribution to First Working Draft of Document Schema Definition Language (DSDL)

 

SOURCE:

G. Williams, U.K.

PROJECT:

 

PROJECT EDITOR:

M. Bryan

STATUS:

First Working Draft

ACTION:

This document was included in the NWI comments, but the U.K. intended to have it distributed separately in its entirety to serve as a base document for further development.

DATE:

 

DISTRIBUTION:

SC34 and Liaisons

REFER TO:

 

REPLY TO:

Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mail: mailto:mxm@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
25 West 43rd Street
New York, NY 10036
Tel: +1 212 642 4937
Fax: +1 212 840 2298
E-mail: shafele@ansi.org

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.

In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote.

International Standard ISO/IEC 13240 was prepared by Joint Technical Committee JTC1, Information technology.

Introduction

SGML Document Type Definitions (DTDs) allow document structures to be formally modelled but do not allow details of data types or data relationships to be recorded in an XML-compatible way. While the W3C XML Schema Definition language (XSD) does allow data types to be used to validate the contents of SGML elements and values of attributes, it does not allow the relationships between the values of different attributes and contents of elements to be validated. A new, compact, efficient and XML-based document type definition for the integrated description of document structures, data types and data relationships will make it possible to automate the processing of structured information resources to the level required by business users, which has a higher level of requirements than those identified from the publishing community for which SGML was originally developed. The standard will also define the scope and notation for converting and interworking a core subset of document structure, data type, and data relationship constraint models among the three notations: DSDL, DTD declarations and XSD.

1 Scope

1.1 Definition of scope

This International Standard, known as the Document Schema Definition Language (DSDL), allows the definition of document structures, data types and data relationship constraints that can be applied to data represented using the ISO/IEC 8879 Standard Generalized Markup Language and its derivatives, such as ISO/IEC 10744, Hypermedia/Time-based Structuring Language (HyTime), and the W3C Extensible Markup Language (XML).

2 Conformance

To be defined

3 Normative references

ISO 8879:1986, Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)

W3C Extensible Markup Language (XML) (http://www.w3.org/TR/REC-xml)

W3C XML Schema Part 2: Datatypes (http://www.w3.org/TR/xmlschema-2/)

4 Definitions

5 Symbols and abbreviations

DSDL

Document Schema Definition Language 

SGML

Standard Generalized Markup Language (ISO/IEC 8879)

XML

W3C Extensible Markup Language

6 Documentation Conventions

Any references in this document to industry and proprietary standards, products, user groups, and publications are not normative, and do not imply endorsement by ISO, IEC, or their national member bodies or affiliates. Any brand names or trademarks mentioned are the property of their respective owners.

The formal definitions are expressed as using the W3C XML subset of SGML.

The formal definitions are part of the text of this International Standard and are protected by copyright. In order to facilitate conformance to DSDL, the formal definitions may be copied as specified in the following copyright notice: Copyright (C) 200? International Organization for Standardization. Permission to copy in any form is granted for use with conforming DSDL systems and applications as defined in ISO/IEC ????, provided this notice is included in all copies. The permission to copy does not apply to any other material in this International Standard.

Note 5. This document uses editorial conventions mandated by the ISO with which the reader should be familiar in order to understand the implications of certain words.

The text describing each construct emphasizes semantics, while the formal XML definition provides the rigorous syntactic definitions underlying the text descriptions.

Note 6. For this reason, it is recommended that the reader refer to the XML definitions while reading the textual descriptions. Although the XML definition always follows the related text, the user may find it helpful to read the XML first in some cases.

When a construct is first introduced, it is described in the text. If the construct occurs in the formal XML specification, both the formal XML name and a full name in English are presented, as follows:

7 ???

Annex 1 (normative): Requirements

This standard is designed to provide the following functionality:

  1. The standard shall provide a means of expressing, in SGML/XML instance format, all of the markup declarations permitted by the WebSGML profile of ISO/IEC 8879 and in Version 1.0 of the W3C Extensible Markup Language (XML)
  2. The standard shall be capable of identifying external data resources that may validly be included within document instances that conform to the model, including data instances that are in notations other than that defined in this standard.
  3. The standard shall be capable of identifying the notations required to process those parts of document instances that are not encoded according to the standard.
  4. The standard should allow the representation within document instances of data in a clearly identified notation that is not intended to be processed by programs that are conformant with this standard.
  5. The standard shall be capable of importing parts of models from external sources
  6. The standard shall provide a means of constraining the number of times a particular element may occur at a given point in a document model to be within a range with specified minimum and/or maximum values.
  7. The standard shall be capable of identifying the character set to be used to constrain the contents of elements or attributes.
  8. The standard shall provide a means of constraining the content of attribute values and elements to conform to a particular datatype or pattern based on a formally named, standardized, set of datatyping rules.
  9. The standard shall provide a means of identifying a set of permitted values against which the content of a particular element or attribute value shall be checked for validity. The set of permitted values may be provided as an external resource, or by reference to an external service using a standardized API.
  10. The standard shall provide a means by which the model of a document can be altered in response to the contents of a particular element or attribute (e.g. if the contents of an element or attribute recording the sex of a person is set to "Male" the use of any elements or attributes related to pregnancy should be forbidden).
  11. The standard shall provide facilities for defining "model types" that can form the basis for the models of elements in multiple document type definitions in such a way that users can restrict the use of parts of the model and add application-specific elements to the models at those points at which they are appropriate.
  12. The standard shall provide a means by which the authority responsible for defining part or all of a document structure can be uniquely identified, with elements defined by different authorities being identifiable as such within document instances.
  13. The standards shall provide a means by which sections of a document structure can be temporarily disabled without having to define a new document structure.
  14. The standard shall provide a means by which the rationale for an element, attribute or other information component can be recorded as an annotation to its declaration
  15. The standard shall be designed in such a way that it can be extended to include the functions of ISO/IEC 8879 not included in the normative part of this standard.

Annex 2 (normative): XML DTD for DSDL

 

Annex 3 (normative): DSDL Description of DSDL

 

Annex 4 (informative): Alphabetical List of DSDL Components

4.1 DSDL components common to SGML and XML

The following DSDL components can be used to describe documents conforming to the WebSGML subset of ISO/IEC 8879:
 

Possible DSDL element/attribute

Defined in clause

Equivalent ISO 8879 Construct

Equivalent XML DTD construct

Equivalent XML Schema element

<attribute

 

[143] attribute definition

 AttlistDecl

<attribute

<attribute
 name

 

[144] attribute name

 Name

<attribute
 name

<attribute
 type
 datatypeNamespace

 

[35] declared value

AttType
NotationName

<attribute
 type
(notation name fixed)

<attribute
 defaultValue

 

[147] default value
[attribute value specification]

DefaultDecl

<attribute
 default

<attribute
 fixed

 

[147] default value ["FIXED"]

DefaultDecl

<attribute
 fixed

<attribute
 source

 

[147] default value ["IMPLIED"|"REQUIRED"]
(but not "CONREF"|"CURRENT")|

DefaultDecl 

<attribute
 use

<characterSet
 encodingBase

 

[173] character set description
[174] base character set

 

EncodingDecl
(in the XML declaration)

encoding
(in the XML declaration)

<comment

Should this be <annotation?

[91] comment declaration

Comment

N/A

<data
 type
 datatypeNamespace

 

From Relax-NG

 N/A

<simpleType
 type

<element

 

[116] element declaration

elementdecl

<element

<element
 name

 

[30] generic identifier

Name

<element
 name

<element
 contentType

Do we need this? Does it need to conflate with type?

[125] declared content

contentspec 
(Do we need special attributes for EMPTY and ANY?)

<any

<element
 type
 datatypeNamespace

 

From Relax-NG

N/A

<element
 type

<element
 minOccurs
 maxOccurs

 

Extension  based on W3C XML Schema that generalizes the specifically named options provided in Relax-NG

N/A

<element
 minOccurs
 maxOccurs

<element
 ref

 

From W3C XML Schema (Relax-NG uses a separate ref element)

N/A 

<element
 ref

<externalEntity 

 

[108] external entity specification

GEDecl

N/A

<externalEntity
 name

 

[102] entity name

Name

N/A

<externalEntity
 href
 publicIdentifier

 

[73] external identifier

ExternalID

N/A

<externalEntity
  notation

 

[41] notation name

NDataDecl

N/A

<group
 connector-type
 minOccurs
 maxOccurs

 

[127] model group (with modifications based on W3C XML Schema that generalize the specifically named options provided in Relax-NG)

children (as modified by W3C XML Schema)

<complexType
(do we also need <simpleType?)

<inclusion name

 

[104] parameter entity name

PEDecl

N/A 

<inclusion
 href
 publicIdentifier

Do we still need to separate out the definition of external parameter entities from their call, or should we move these two properties to the <include element?

 

PEDef
ExternalID

N/A (moved to the import request)

<include
 name

 

[60] parameter entity reference

PEReference

<import  (but unnamed, with direct reference to the source, see above)

<localEntity
 name

 

[101] entity declaration

GEDecl
Name
EntityValue

N/A

<localProcess
 notation

Do we need this?

[44] processing instruction

PI
PITarget

N/A

<markedSection
 status

 

[93] marked section declaration

CDSect
conditionalSect

N/A

<notation 

 

[148] notation declaration

NotationDecl

<notation

<notation
 name 

 

[41] notation name

Name

<notation
 name

<notation
 href
 publicIdentifier

 

[149] notation identifier

ExternalID

<notation
 system
 public

<permittedValue
 type
 datatypeNamespace

 

Based on W3C XML Schema enumeration and  Relax-NG value elements. Extends [145] declared value [name token group] to constrain contents of text fields as well as attribute values

Enumeration (as extended to element content by W3C XML Schema and Relax-NG) 

<enumeration
 type

<schema
 href
 publicIdentifier

 Do we need a public identifier?

[110] document type declaration [external identifier]

doctypedecl External ID

<schema + <import or <include

<schema
 docType

 

[111] document type name

doctypedecl Name 

N/A 

<text

 

[47] character data
[33] attribute value specification

#PCDATA

 

4.2 DSDL components specific to SGML

The following extensions could be made if it is decided that DSDL should be able to express all constructs in SGML document instances as well as the WebSGML subset.
 

Possible DSDL element/attribute

Defined in clause

Equivalent ISO 8879 Construct

<applicationInfo

Do we need this?

[199] application-specific information

<attribute source

 

[147] default value ["IMPLIED"|"REQUIRED"|
 "CURRENT"|"CONREF"]

<capacitySet publicIdentifier

Do we need this?

[180] capacity set

<characterDescription

Do we need this?

[176] character description

<characterDescription startingFrom

Do we need this?

[177] described character set number

<characterDescription for

Do we need this?

[179] number of characters

<characterDescription becomes

Do we need this?

[178] base character set number, "UNUSED" or literal

<externalEntity
 entityType 

Do we need this?

[109] entity type

<externalEntity
 dataAttributeSet

Do we need this? should the data attributes be defined as the contents of the entity defintion?

[149.2] data attribute specification

<dataTagGroup elementName

Do we need this? Could the data tag details somehow be added directly to the element declaration?

[133] data tag group

<dataTagGroup paddingTemplate

 

[137] data tag padding template 

<dataTagTemplate

 

[136] data tag template

<delimiterAssignment name literal

Do we need this?

[191] general delimiters

<delimiters

Do we need this?

[190] delimiter set

<element documentTypes

 

[28 document type specification

<element end-character

Do we need this?

[17] NET-enabling start-tag

<element mixed

Do we need this?

[25] mixed content

<element omitStart

 

[123] start-tag minimization

<element omitEnd

 

[124] end-tag minimization

<element rankStem

Do we need this?

[120] rank stem

<element rankSuffix

Do we need this?

[121] rank suffix

<element unclosed

Do we need this?

[17] unclosed start-tag

<exclusions elementNames

 

[140] exclusions

<explicitLink sourceDocType resultDocType

 

[158] explicit link specification

<features

Do we need this?

[195] feature use

<functionChars

Do we need this?

[186] function character identification

<idLinkSet

 

[168.1] ID link set declaration

<implicitlink sourceDocType

 

[157] implicit link source

<inclusions elementNames

 

[139] inclusions

<linkRule sourceElementNames

 

[163.1] link rule {source element specification]

<linkRule resultElementNames 

 

[166.1] explicit link rule {result element specification]

<linkSet name

 

[164] link set name

<linktype

 

[154] link type declaration

<linktype name

 

[155] link type name

<linktype href publicIdentifier

 

[73] external identifier

<markedSection status

 

[93] marked section declaration

<namingRules

Do we need this

[189] naming rules

<quantities

Do we need this?

[194] quantity set

<reservedName changeFrom changeTo

Do we need this?

[193] reserved name use

<schema sgmlDeclaration

Do we need this?

[171] SGML declaration

<sgmlDeclaration name

Do we need this?

[171] SGML declaration

<shortRefDelimiters

Do we need this?

[191] short reference delimiters

<shortRefSet name

 

[150] short reference mapping declaration

<shunnedChars useControls

Do we need this?

[184] shunned character number

<simpleLink

 

[156] simple link specification

<syntax publicIdentifier
 switches

Do we need this?

[183] public concrete syntax

<useLink linkSetName postLinkSetName

 

[165] source element specification [USELINK]

<useMap name elementNames

 

[152] short reference use declaration