From comp@komp.ace.nl Thu Jan  7 11:54:04 1993
Received: from sun4nl.nluug.nl by dkuug.dk with SMTP id AA25513
  (5.65c8/IDA-1.4.4j for <sc22wg11@dkuug.dk>); Thu, 7 Jan 1993 11:54:04 +0100
Received: from ace by sun4nl.nluug.nl via EUnet
	id AA03585 (5.65b/CWI-3.3); Thu, 7 Jan 1993 11:54:08 +0100
Received: from ace.ace.nl ([194.0.2.40]) by netnog.ace.nl with SMTP
          id AA14407 (1.14/890.1); Thu, 7 Jan 93 08:44:54 +0100 (MET)
X-Organisation: ACE Associated Computer Experts bv.
                Amsterdam, The Netherlands.
                +31 20 6646416 (phone)
                +31 20 6750389 (fax)
                11702 (ace nl) (telex)
Received: from komp.ace.nl ([192.1.2.90]) by ace.ace.nl with SMTP
          id AA02578 (1.14/2.17); Thu, 7 Jan 93 09:15:26 +0100 (MET)
Received: by komp.ace.nl with SMTP id AA00722 (1.10/2.17);
	  Thu, 7 Jan 93 09:44:22 +0100 (MET)
To: sc22wg11@dkuug.dk
Subject: WG11/N352 - Liaison Statement from SC21/WG3 DBL
Date: Thu, 07 Jan 93 09:44:14 N
Message-Id: <720.726396254@komp>
From: Willem Wakker <comp@ace.nl>
X-Charset: ASCII
X-Char-Esc: 29


ISO/IEC JTC1/SC22/WG11 N352

ISO/IEC JTC1/SC21/WG3  N1452

November 20, 1992


                                     ISO/IEC JTC 1/SC 21/WG 3
                                                 
                                             Database
                                                 
                                     Secretariat: Canada (SCC)


Title:                DBL Liaison to CLID re: Tables

Source:               DBL Rapporteur Group

Author:               David Beech

Projects:             ISO/IEC JTC 1.21.3.4 (SQL3), and
                              JTC 1.22.17  (CLID)

Status:                       Approved by DBL RG

Requested Action:             Approval by SC21/WG3
                              Consideration by SC22/WG11

References:

         [1]          Jim Melton (ed):  (ISO/ANSI Working Draft) Database
                      Language SQL3, ISO/IEC JTC1/SC21 N6931, (also:
                      X3H2-92-155 or DBL CBR-003), July, 1992.

         [2]          Edward J Barkmeyer (ed): (Working Draft #6.1,
                      EditorUs Interim Draft to CD 11404) Common Language-
                      Independent Datatypes (CLID), ISO/IEC JTC1/SC22/WG11
                      N319R, 9 September, 1992  (also: DBL CBR-28 or X3H2
                      92-205).


1.  Motivation

At its meeting in Canberra, Australia, November 16-20, 1992, the
WG3 Database Languages rapporteur group reviewed some aspects of
the CLID specification [2], especially the aggregate-type
generators for Sets, Multisets, and Lists, which have been
influential on proposals for definitions of collection type
templates for SQL3 [1].

During this review, it was noted that the CLID definition of a
Table differed significantly from that of a Table in SQL.  Since
SQL has been standardized by ISO since 1986, and SQL tables are in 
widespread use, the SQL concept, well founded in relational theory,
appears to be a strong candidate as a common language-independent
datatype.  Programs in many different programming languages already
interface with SQL tables.  Seven standard language bindings are
defined at the level of elementary datatypes in SQL-92, and higher-
level bindings at the row and table levels would be facilitated by
appropriate CLID definitions.

Therefore DBL passed a motion authorizing that this paper, defining
the differences between the exisiting SQL and CLID concepts of a
table, be submitted to SC21/WG3 and to SC22/WG11 for the purpose
of recommending that CLID either adopts the SQL concept or drops
Table from their specification.  The SC22/WG3 DBL group is willing
to collaborate with SC22/WG11 if WG11 chooses to adopt the SQL
concept of Table.

In the list of CLID Outstanding Issues [2, p. i], issue 1 refers
to some open questions about the relationships between "List and
Array, and to a lesser extent Set, Bag, and Table", particularly
an incompatibility with the Fortran-90 model of Array.  The
suggestion of this paper is that the incompatibility with the SQL
model of Table is no less deserving of attention.


2.  Comparison of Table Concepts

2.1  Keys and Elements

In the relational model, a table consists of rows and columns.  One
or more (or even all) of the  columns may be denominated as
containing key values, but these values are also data values and 
may be treated as such.  For example, they may be retrieved in
queries, and may even be updated, so long as they do not violate
the constraint that the tuple of key values in a row is not
duplicated in any other row in the table.  An important principle
is that the data values themselves serve as primary keys for simple
identification of rows, and more complex information structures are
expressed totally by data values, such as by use of foreign keys
which are copies of primary keys.

In the CLID model of a table, the key-value-list is disjoint from
the element-value.  With the simple forms of Select, Insert, and
Delete defined for CLID, this could be regarded as a subset of the
relational model in which the keys are never treated as data
values, so that it makes no difference that they are separated out. 
However, this does not generalize well to support the more powerful
operations on tables provided by SQL (see below).  In SQL3, it is
proposed to be able to define a row type T, and then CREATE TABLE
TeaTable OF T.  This could be defined to correspond to
instantiating a type SQL_Table(T) generated from a type template
for an SQL_Table, which in CLID terms is a datatype generator for
a table in which T is the element-type.  However, in SQL the type
T defines the key values (if any) within it.  A table without keys
still supports Select, Insert and Delete operations in SQL, which
is not expressible in the current CLID model.

Thus if CLID wished to adopt the SQL concept, the Syntax would show
merely

       table-type = "table" "of" "(" element-type ")" 

optionally followed by a keyfield-identifier-list.  Currently the
element-type must be a record-type, but relaxations to allow
arbitrary Table collections are under consideration in SQL3.


2.2  Set-valued Operations

One of the most important and popular features of the relational
model has been the ability to operate on sets of rows in a single
operation.  For example, the SQL SELECT produces a table as its
result - an important closure property which enables nesting of the
operation.  In CLID, the Select function returns a single value of
the element-type, and thus cannot be nested - a serious drawback
in a functional model.  A single SQL INSERT can insert a set of
rows, and a single DELETE can delete all rows satisfying a
specified predicate.

The lack of set-valued operations on a CLID table appears also to
make it impractical to retrieve or iterate over all members of the
set of elements.  There is neither an operation to retrieve all
elements nor one to retrieve all keys, so that it is necessary to
try all keys in the Cartesian product of the key-types, at least
until emptying a copy of the table from which elements are deleted
as they are found. A definition of Select returning an arbitrary
element, as for a CLID set, overcomes this difficulty in theory,
but is still cumbersome in practice since it requires Select and
Delete operations on each iteration.  The SQL solution is twofold:
first, for set-at-a-time operations, to offer the set-valued 
Select with an optional search-condition that returns the subset
of the elements that satisfy the search-condition (or the whole
table in the absence of a search-condition); second, for element-
at-a-time processing, to make it possible to define a Cursor
associated with a table, with a Fetch operation that retrieves the
elements in an arbitrary sequence without requiring any deletion.

In an SQL_Table type template, the functions defined on a table can
easily express the special cases defined by the CLID functions, but
with different arguments, i.e. the CLID functions do not generalize
smoothly to the SQL functions.  Possibly it is thought too
ambitious to offer the  functionality of SQL in a language-
independent datatype at the present time.  However, with the
rapidly growing need for programming languages to interface to
database systems and to deal simply and efficiently with large
collections of elements, the richer model of a table will probably
be relevant by the time CLID is approved, and will certainly be
important during the lifetime of the standard.

At a minimum, it would be advantageous if the CLID model were a
compatible subset of the SQL model.  A stronger position would be
for CLID to adopt the general model, with levels of conformance to
allow for subset implementations.


2.3  Multi-table Select

CLID supports Select on a single table only, but in SQL it is
important to be able to select information from multiple tables in
a single operation.  Although there are explicit join operators,
these are syntactic sugar for what is expressible in a Select
operation that specifies multiple tables in its from-clause, with
an appropriate search-condition.

Since the search-condition in an SQL Select may make symbolic
references to columns of these tables, one argument to the
SQL_Select function in an SQL_Table type template is a character-
string representation of a SELECT statement, including the from-
clause with the names of the tables.  The first of these tables is
taken as the principal argument, whose type is SQL_Table(T) where
T is the element type for this particular generated datatype with
which the Select function is associated.

If CLID were to adopt the argument structure of the SQL_Select,
then subset levels could be applied to what can be expressed in the
character-string argument.


2.4  Updatability

The first issue of updatability is a general one that can probably
be dealt with quickly for present purposes.  It appears that CLID
is defining purely functional operations, without side-effects.  
SQL, like most programming languages, is not so restrictive, and
this suggests an "impedance  mismatch" between the types and
functions of CLID and those naturally occurring in its client 
languages.  For example, to insert a row into an SQL_Table T, it
is sufficient to invoke SQL_Insert(T,...).  Assuming that it is the
intention that for most languages, the CLID types would be
supplemented by a general assignment operation, this would be
expressed in CLID terms as T := CLID_Insert(T,...).  An optimizer
has to notice that a temporary table is not required.  Similar
considerations apply to Delete.  The impedance mismatch may be a
serious problem for CLID aggregate datatypes, but it is not
specific to the SQL binding, so it will not be further discussed
here.

The second issue is that SQL supports an SQL_Update that allows for
partial or total update of (one or more elements of) a table.  This
more refined update, say of a single field in a record element,
rather than assignment of a complete new table, is a more direct
way of coping efficiently, not merely with single-user access to
a table, but with questions such as authorization and concurrency
management where multiple users are concurrently accessing and
changing the state of a table.  The alternative offered by CLID,
to use CLID_Delete followed by CLID_Insert on successive temporary
tables, not only complicates the optimization problem, which cannot
be ignored on potentially large collection datatypes, but may
introduce an integrity gap between the Delete and Insert that would
then call for more sophisticated facilities for making the pair of
actions atomic and deferring integrity checking until their
completion.  Authorization becomes more difficult to specify if a
user is to be authorized to insert and update only a certain column
of a table, and this has to be achieved by a CLID_Insert of a whole
row rather than by an Update of a single field.  Concurrency
control of finer granularity than the row is similarly made more
complicated.

Thus it would be beneficial for CLID to add an Update operation,
orthogonally to the question raised in the first issue in this
section, i.e. as to whether it actually updates the table specified
as the first argument, or returns a new table embodying this
update.


2.5  Duplicate Rows

A table in SQL allows duplicate rows if there is no constraint
preventing this, and is thus in general a multiset of elements,
rather than a set as in CLID.

With the SQL model of a table as a collection of elements without
disjoint keys (see 2.1 above), the possibility is opened up of
including the conventional operations on multisets, and indeed
sets, in the type definition for an SQL_Table (this possibility is
under discussion in SQL3 development).  In CLID terms, the Bag and
Table datatypes could be merged. 


2.6  Other Differences

This short paper does not claim to give an exhaustive list of
differences between the SQL and CLID models of a table.

Some other differences are inessential, such as the absence of an
equality operation on SQL_Tables, and the fact that an SQL_Table
has an Exists function which is the negation of the Empty operation
on a CLID table.

There are are also more general questions about a binding between
SQL and CLID that are not investigated here, such as the treatments
of null (nil) values and exceptions.  What are the implications of
an operation such as Select on a CLID Table being defined as a
partial function, with its arguments required to satisfy a where-
clause?  In this case, moreover, the equivalent SQL Select is a
total function, returning an empty table if there is no match with
the keys.

As noted above, the SQL_Table template also contains many
additional functions relevant to sets and multisets of elements.


2.7  Open Questions

Work is still in progress on the SQL_Table type template, and also
on related templates for SQL_Set and SQL_List concepts. 
Significant aspects that are well defined in SQL-92, but have not
yet been specified by the type template methodology, include the
treatment of views (i.e. intensionally defined tables), and cursors
(i.e. intensional tables with a current row and only sequential
operations to fetch, update or delete a row at a time).


3  Conclusions

Consistency between the CLID concept of Table and the concept
already standardized and in widespread use in SQL would improve the
prospects of compliance with CLID [2, pp. 7-10], not only for SQL
but also for programming languages that are used to interface to
SQL.
 
To the possible objection that the concepts are not intended to
correspond and that the problem could be resolved by renaming the
CLID concept, a reply is that the SQL concept deserves to be 
included in CLID, and this would then lead to similar but
inconsistent concepts in CLID, whereas it is possible and
preferable to express one as a compatible subset of the other.

------- End of Forwarded Message