Language Design Guidelines

William Kent
Database Technology Department
Hewlett-Packard Laboratories
Palo Alto, California

Jan 1994

SQL3 Discussion Paper (Draft)
ANSI Number: X3H2-94-368NEW
January 1994
Title: Language Design Guidelines
Author: William Kent


CONTENTS:
> 1 INTRODUCTION . . . 1
> 2 COMPATIBILITY WITH SQL2 . . . 2
> 3 GENERALIZATION . . . 2
> 4 IMPLEMENTATION HIDING . . . 2
>> 4.1 OID formats . . . 3
>> 4.2 Private data in constructors . . . 3
>> 4.3 Names of attributes . . . 4
>> 4.4 Separating interface and implementation specifications . . . 4
>> 4.5 Organization of sub- and supertables, or sub- and supertype extent tables . . . 4
>> 4.6 Private tables for type extents . . . 6
>> 4.7 Where object data is stored . . . 7
> REFERENCES . . . 7


1 INTRODUCTION

This paper contains selected topics in language design guidelines on which the development of SQL3 [1] could be based. It is submitted as a discussion document to provide X3H2 an opportunity to express an opinion and determine consensus regarding various elements of the guideline. It is expected that new proposed guidelines may be submitted in future papers. They will probably be maintained in cumulative form to provide an overall reference document, with new material highlighted.

The material on implementation hiding [Section 4] is adapted from [2], which was defeated by X3H2 on Dec. 14, 1993 at the meeting in New Orleans on the grounds that the material did not belong in the draft standard [1], and that the committee did not wish to establish a separate official committee document in which to embody such language design principles. The committee did vote to forward the paper to ISO as an individual expert contribution. However, no attempt was made at that time to determine the extent to which X3H2 as a committee endorses such principles.

The present paper contains much the same technical content as [2], a bit elaborated and refined. It is submitted as a discussion paper for the purpose of determining the degree of consensus within X3H2 on such language design guidelines. It is requested that these guidelines be discussed, and that a straw vote be taken on each numbered section and subsection below.

It may be noted that some of the concepts mentioned herein may be reflected in current language facilities, while others constitute language opportunities.

2 COMPATIBILITY WITH SQL2

[Get a reference for SQL2.]

Maintaining compatibility with SQL2 is imperative, of course, in the sense that existing applications using SQL2 should run successfully on an SQL3 implementation. In general it should be permissible for SQL3 to relax restrictions, and to allow things not allowed in SQL2. This would not impact existing applications since they only use things which are allowed in SQL2.

Except for exceptions, that is. There may be some applications whose normal operation depends on raising certain exceptions which might not occur in SQL3. It could be argued that this is a misuse of the exceptions facility, which is intended to deal with the occurrence of undesirable conditions. It should at least be legitimate to eliminate exceptions which are simply due to language limitations in SQL2, rather than to conditions arising in the normal operation of applications.

3 GENERALIZATION

Generalization is another useful principle, being part of the foundation of object orientation.

Reuse/extension of existing lang facilities, rather much special casing. In the spirit of OO.

For types: generalize as supertype where possible to capture common functionality, using subtypes for differences.

Similarly for operations: tend toward fewer ops, with parameterized options, rather than many specialized ops. (Code reuse?) Of course, don't go overboard. Obviously we can go too far and reduce everything to a single operation: DO(operation, operands).

4 IMPLEMENTATION HIDING

Papers [5,6] mention "the tradition of SQL as being more of an `end-user' language than a `system programmer' language". Such a language depends on, among other things, the object-oriented principle of implementation hiding.

To enhance portability and reusability, applications should be insulated from the effects of changing implementations as much as possible. As reflected in the definition of "abstract data type (ADT)" in Subclause 3.1.3a of [1], object orientation promotes such insulation via "the separation of the interface of the type from its implementation". The definition of "implementation (of an ADT)" in Subclause 3.1.3u of [1] goes on to say that "Stored data together with the data structures and code that implement the behavior of an ADT is its implementation."

"Implementation" can be interpreted here as those specifications which can be altered without altering the correctness of an application. Changing such specifications might require recompilation of the application, but no reprogramming.

This proposition has the following implications for a schema and a data definition language:

While it is certainly desirable for a user to have confidence in the correctness of an implementation, it is difficult to guarantee such correctness. Any assurances of correctness should not compromise these principles.

There is an important question of how an implementation is chosen when alternative implementations are available, especially for newly-created objects. Ideally, the choice should be made outside the application, perhaps by some defaulting or context-dependent mechanism. For example, the local copy of the schema might only specify one implementation. Alternatively, the implementation might be chosen on the basis of which machine or operating system the application is running on, or it might depend on which application invoked the given application.

To the extent that an application does choose the implementation and/or it makes decisions based on the implementation, the portability and reusability of the application are compromised.

Precise interpretation of the general principle of implementation hiding can be guided by corollaries such as the followingº

4.1 OID formats

OID formats are a matter of implementation, and not part of the language standard, except perhaps at a low level comparable to standardizing the exact bit pattern representation of character strings. Whether any useful information happens to be embedded in the oid rather than in a separate table should be left to the implementers.

[Acceptance of this corollary should be accompanied by a change to the definition of object identifier type in Subclause 4.9, removing any reference to type information being contained in an OID.]

4.2 Private data in constructors

The essential purpose of private attributes is to describe data which is internal to the implementation, and should not be exposed to users. In particular, applications should in no way be dependent on private attributes. By extension, an application should even be insensitive to changes in the configuration of such private data. In particular, an application should not initialize private data.

Typical examples of private data (in non-relational terms) include such things as:

Internal representations of data visible to applications should be initialized in terms of the visible data. Thus if age is visible while birthday is hidden, then the application should initialize age, not birthday. If this is not really invertible, the counter-argument is that this is not a realistic example.

4.3 Names of attributes

Applications don't have any need to refer to the names of attributes. They only need to use the retrieval and update operations (observers and mutators). There is no logical difference between an attribute name and its observer operation. Thus it is sufficient to specify that a mutator updates an observer, such as

Set_Age UPDATES Get_Age

Move UPDATES Locate

without specifying an attribute name. Note that naming conventions do not require the attribute name to be part of the mutator or observer operation names.

4.4 Separating interface and implementation specifications

Suppose an interface for circular objects is defined with members for Center, Radius, and Diameter. It should be possible for different implementations to store either the radius or the diameter, or both. Updates to either can be propagated into updates to whichever ones are stored. An application should be able to function with a single interface to any of these implementations, and even survive a change of implementations with at most a recompilation.

It would be appropriate for the interface to document relevant semantic constraints, such as the fact that the diameter is twice the radius. However, implementation characteristics, such as which attributes are stored and which are virtual, should not be part of the interface specification.

As another example (illustrated in section 1.6 of [3]), geometric points should be defined as abstract points which might be implemented by storing either polar or rectangular coordinates. The properties of a point include x and y coordinates and also a magnitude and angle. It should be possible to designate a point using either polar or rectangular coordinates (i.e., two designators), independently of how the point is stored in any particular implementation.

4.5 Organization of sub- and supertables, or sub- and supertype extent tables

There are many ways to configure the "real" tables which implement the semantics of sub- and supertables, as well as the extent tables for sub- and supertypes. It should be possible for an application to operate with any of these configurations, and to even allow the underlying configurations to be changed without impacting the logic of the application (recompilation might be required, but not reprogramming).

We can illustrate some possible configurations using two example types/tables: Person(Name,Birthplace) and Student(Major,Credits). Let :pam be a person who is not a student and :sam be a student, hence also a person.

4.6 Private tables for type extents

As an extension of the previous point, it should be optionally possible to restrict access to extent tables for types so that they can only be manipulated from within the bodies of constructor, destructor, observer, and mutator routines. It would thus be possible to configure the schema so that applications do not directly access such tables. Changes in implementation would be reflected in changes within these routines, without otherwise affecting applications exploiting this option.

4.7 Where object data is stored

An object may be mentioned in many places in many tables and/or variables, under the same or different types. It should not matter to an application which place, or how many places, actually store the data about the object. Such definition should be part of the implementation specification, which may in the future even provide for replica management.

REFERENCES

1 X3H2-93-359R/ISO DBL MUN-003, (ISO-ANSI Working Draft) Database Language SQL (SQL3), Jim Melton (ed.), August 1993.

2 X3H2-93-368R1, "Implementation Hiding", William Kent, Nov. 17 1993. [html]

DO WE NEED ANY OF THE FOLLOWING?

3 X3H2-93-076, "SQL3 OO Underlying Assumptions", by William Kent. [html]

4 X3H2-93-109, "Identity and Equality", by William Kent and Amelia Carlson. [html]

5 X3H2-93-234, "POINTS TO and CONTAINS", May 1 1993, by David Beech, Boris Burshteyn and Phil Shaw.

6 X3H2-93-384R, "Extents for object ADTs", Sept 10 1993, by John Bellemore, Tim Nguyen, Gray Clossman and Phil Shaw.