From comp@komp.ace.nl Thu Jan 7 11:54:04 1993 Received: from sun4nl.nluug.nl by dkuug.dk with SMTP id AA25513 (5.65c8/IDA-1.4.4j for ); Thu, 7 Jan 1993 11:54:04 +0100 Received: from ace by sun4nl.nluug.nl via EUnet id AA03585 (5.65b/CWI-3.3); Thu, 7 Jan 1993 11:54:08 +0100 Received: from ace.ace.nl ([194.0.2.40]) by netnog.ace.nl with SMTP id AA14407 (1.14/890.1); Thu, 7 Jan 93 08:44:54 +0100 (MET) X-Organisation: ACE Associated Computer Experts bv. Amsterdam, The Netherlands. +31 20 6646416 (phone) +31 20 6750389 (fax) 11702 (ace nl) (telex) Received: from komp.ace.nl ([192.1.2.90]) by ace.ace.nl with SMTP id AA02578 (1.14/2.17); Thu, 7 Jan 93 09:15:26 +0100 (MET) Received: by komp.ace.nl with SMTP id AA00722 (1.10/2.17); Thu, 7 Jan 93 09:44:22 +0100 (MET) To: sc22wg11@dkuug.dk Subject: WG11/N352 - Liaison Statement from SC21/WG3 DBL Date: Thu, 07 Jan 93 09:44:14 N Message-Id: <720.726396254@komp> From: Willem Wakker X-Charset: ASCII X-Char-Esc: 29 ISO/IEC JTC1/SC22/WG11 N352 ISO/IEC JTC1/SC21/WG3 N1452 November 20, 1992 ISO/IEC JTC 1/SC 21/WG 3 Database Secretariat: Canada (SCC) Title: DBL Liaison to CLID re: Tables Source: DBL Rapporteur Group Author: David Beech Projects: ISO/IEC JTC 1.21.3.4 (SQL3), and JTC 1.22.17 (CLID) Status: Approved by DBL RG Requested Action: Approval by SC21/WG3 Consideration by SC22/WG11 References: [1] Jim Melton (ed): (ISO/ANSI Working Draft) Database Language SQL3, ISO/IEC JTC1/SC21 N6931, (also: X3H2-92-155 or DBL CBR-003), July, 1992. [2] Edward J Barkmeyer (ed): (Working Draft #6.1, EditorUs Interim Draft to CD 11404) Common Language- Independent Datatypes (CLID), ISO/IEC JTC1/SC22/WG11 N319R, 9 September, 1992 (also: DBL CBR-28 or X3H2 92-205). 1. Motivation At its meeting in Canberra, Australia, November 16-20, 1992, the WG3 Database Languages rapporteur group reviewed some aspects of the CLID specification [2], especially the aggregate-type generators for Sets, Multisets, and Lists, which have been influential on proposals for definitions of collection type templates for SQL3 [1]. During this review, it was noted that the CLID definition of a Table differed significantly from that of a Table in SQL. Since SQL has been standardized by ISO since 1986, and SQL tables are in widespread use, the SQL concept, well founded in relational theory, appears to be a strong candidate as a common language-independent datatype. Programs in many different programming languages already interface with SQL tables. Seven standard language bindings are defined at the level of elementary datatypes in SQL-92, and higher- level bindings at the row and table levels would be facilitated by appropriate CLID definitions. Therefore DBL passed a motion authorizing that this paper, defining the differences between the exisiting SQL and CLID concepts of a table, be submitted to SC21/WG3 and to SC22/WG11 for the purpose of recommending that CLID either adopts the SQL concept or drops Table from their specification. The SC22/WG3 DBL group is willing to collaborate with SC22/WG11 if WG11 chooses to adopt the SQL concept of Table. In the list of CLID Outstanding Issues [2, p. i], issue 1 refers to some open questions about the relationships between "List and Array, and to a lesser extent Set, Bag, and Table", particularly an incompatibility with the Fortran-90 model of Array. The suggestion of this paper is that the incompatibility with the SQL model of Table is no less deserving of attention. 2. Comparison of Table Concepts 2.1 Keys and Elements In the relational model, a table consists of rows and columns. One or more (or even all) of the columns may be denominated as containing key values, but these values are also data values and may be treated as such. For example, they may be retrieved in queries, and may even be updated, so long as they do not violate the constraint that the tuple of key values in a row is not duplicated in any other row in the table. An important principle is that the data values themselves serve as primary keys for simple identification of rows, and more complex information structures are expressed totally by data values, such as by use of foreign keys which are copies of primary keys. In the CLID model of a table, the key-value-list is disjoint from the element-value. With the simple forms of Select, Insert, and Delete defined for CLID, this could be regarded as a subset of the relational model in which the keys are never treated as data values, so that it makes no difference that they are separated out. However, this does not generalize well to support the more powerful operations on tables provided by SQL (see below). In SQL3, it is proposed to be able to define a row type T, and then CREATE TABLE TeaTable OF T. This could be defined to correspond to instantiating a type SQL_Table(T) generated from a type template for an SQL_Table, which in CLID terms is a datatype generator for a table in which T is the element-type. However, in SQL the type T defines the key values (if any) within it. A table without keys still supports Select, Insert and Delete operations in SQL, which is not expressible in the current CLID model. Thus if CLID wished to adopt the SQL concept, the Syntax would show merely table-type = "table" "of" "(" element-type ")" optionally followed by a keyfield-identifier-list. Currently the element-type must be a record-type, but relaxations to allow arbitrary Table collections are under consideration in SQL3. 2.2 Set-valued Operations One of the most important and popular features of the relational model has been the ability to operate on sets of rows in a single operation. For example, the SQL SELECT produces a table as its result - an important closure property which enables nesting of the operation. In CLID, the Select function returns a single value of the element-type, and thus cannot be nested - a serious drawback in a functional model. A single SQL INSERT can insert a set of rows, and a single DELETE can delete all rows satisfying a specified predicate. The lack of set-valued operations on a CLID table appears also to make it impractical to retrieve or iterate over all members of the set of elements. There is neither an operation to retrieve all elements nor one to retrieve all keys, so that it is necessary to try all keys in the Cartesian product of the key-types, at least until emptying a copy of the table from which elements are deleted as they are found. A definition of Select returning an arbitrary element, as for a CLID set, overcomes this difficulty in theory, but is still cumbersome in practice since it requires Select and Delete operations on each iteration. The SQL solution is twofold: first, for set-at-a-time operations, to offer the set-valued Select with an optional search-condition that returns the subset of the elements that satisfy the search-condition (or the whole table in the absence of a search-condition); second, for element- at-a-time processing, to make it possible to define a Cursor associated with a table, with a Fetch operation that retrieves the elements in an arbitrary sequence without requiring any deletion. In an SQL_Table type template, the functions defined on a table can easily express the special cases defined by the CLID functions, but with different arguments, i.e. the CLID functions do not generalize smoothly to the SQL functions. Possibly it is thought too ambitious to offer the functionality of SQL in a language- independent datatype at the present time. However, with the rapidly growing need for programming languages to interface to database systems and to deal simply and efficiently with large collections of elements, the richer model of a table will probably be relevant by the time CLID is approved, and will certainly be important during the lifetime of the standard. At a minimum, it would be advantageous if the CLID model were a compatible subset of the SQL model. A stronger position would be for CLID to adopt the general model, with levels of conformance to allow for subset implementations. 2.3 Multi-table Select CLID supports Select on a single table only, but in SQL it is important to be able to select information from multiple tables in a single operation. Although there are explicit join operators, these are syntactic sugar for what is expressible in a Select operation that specifies multiple tables in its from-clause, with an appropriate search-condition. Since the search-condition in an SQL Select may make symbolic references to columns of these tables, one argument to the SQL_Select function in an SQL_Table type template is a character- string representation of a SELECT statement, including the from- clause with the names of the tables. The first of these tables is taken as the principal argument, whose type is SQL_Table(T) where T is the element type for this particular generated datatype with which the Select function is associated. If CLID were to adopt the argument structure of the SQL_Select, then subset levels could be applied to what can be expressed in the character-string argument. 2.4 Updatability The first issue of updatability is a general one that can probably be dealt with quickly for present purposes. It appears that CLID is defining purely functional operations, without side-effects. SQL, like most programming languages, is not so restrictive, and this suggests an "impedance mismatch" between the types and functions of CLID and those naturally occurring in its client languages. For example, to insert a row into an SQL_Table T, it is sufficient to invoke SQL_Insert(T,...). Assuming that it is the intention that for most languages, the CLID types would be supplemented by a general assignment operation, this would be expressed in CLID terms as T := CLID_Insert(T,...). An optimizer has to notice that a temporary table is not required. Similar considerations apply to Delete. The impedance mismatch may be a serious problem for CLID aggregate datatypes, but it is not specific to the SQL binding, so it will not be further discussed here. The second issue is that SQL supports an SQL_Update that allows for partial or total update of (one or more elements of) a table. This more refined update, say of a single field in a record element, rather than assignment of a complete new table, is a more direct way of coping efficiently, not merely with single-user access to a table, but with questions such as authorization and concurrency management where multiple users are concurrently accessing and changing the state of a table. The alternative offered by CLID, to use CLID_Delete followed by CLID_Insert on successive temporary tables, not only complicates the optimization problem, which cannot be ignored on potentially large collection datatypes, but may introduce an integrity gap between the Delete and Insert that would then call for more sophisticated facilities for making the pair of actions atomic and deferring integrity checking until their completion. Authorization becomes more difficult to specify if a user is to be authorized to insert and update only a certain column of a table, and this has to be achieved by a CLID_Insert of a whole row rather than by an Update of a single field. Concurrency control of finer granularity than the row is similarly made more complicated. Thus it would be beneficial for CLID to add an Update operation, orthogonally to the question raised in the first issue in this section, i.e. as to whether it actually updates the table specified as the first argument, or returns a new table embodying this update. 2.5 Duplicate Rows A table in SQL allows duplicate rows if there is no constraint preventing this, and is thus in general a multiset of elements, rather than a set as in CLID. With the SQL model of a table as a collection of elements without disjoint keys (see 2.1 above), the possibility is opened up of including the conventional operations on multisets, and indeed sets, in the type definition for an SQL_Table (this possibility is under discussion in SQL3 development). In CLID terms, the Bag and Table datatypes could be merged. 2.6 Other Differences This short paper does not claim to give an exhaustive list of differences between the SQL and CLID models of a table. Some other differences are inessential, such as the absence of an equality operation on SQL_Tables, and the fact that an SQL_Table has an Exists function which is the negation of the Empty operation on a CLID table. There are are also more general questions about a binding between SQL and CLID that are not investigated here, such as the treatments of null (nil) values and exceptions. What are the implications of an operation such as Select on a CLID Table being defined as a partial function, with its arguments required to satisfy a where- clause? In this case, moreover, the equivalent SQL Select is a total function, returning an empty table if there is no match with the keys. As noted above, the SQL_Table template also contains many additional functions relevant to sets and multisets of elements. 2.7 Open Questions Work is still in progress on the SQL_Table type template, and also on related templates for SQL_Set and SQL_List concepts. Significant aspects that are well defined in SQL-92, but have not yet been specified by the type template methodology, include the treatment of views (i.e. intensionally defined tables), and cursors (i.e. intensional tables with a current row and only sequential operations to fetch, update or delete a row at a time). 3 Conclusions Consistency between the CLID concept of Table and the concept already standardized and in widespread use in SQL would improve the prospects of compliance with CLID [2, pp. 7-10], not only for SQL but also for programming languages that are used to interface to SQL. To the possible objection that the concepts are not intended to correspond and that the problem could be resolved by renaming the CLID concept, a reply is that the SQL concept deserves to be included in CLID, and this would then lead to similar but inconsistent concepts in CLID, whereas it is possible and preferable to express one as a compatible subset of the other. ------- End of Forwarded Message