William Kent
Database Technology Department
Hewlett-Packard Laboratories
Palo Alto, California
Nov 1993
SQL3 Discussion Paper [11/15/93]
ISO Number: WG3 DBL YOK-38
ANSI Number: X3H2-93-076
Title: SQL3 OO Underlying Assumptions
Author: William Kent
CONTENTS:
> 1 INTRODUCTION . . . 1
> 2 DEFINITION OF "OBJECT" . . . 1
> 3 SEMANTICS OF IDENTITY AND EQUALITY . . . 2
> 4 SEPARATION OF INTERFACE AND IMPLEMENTATION . . . 3
> 5 TYPE RELATIONSHIPS . . . 4
It would be useful to establish consensus regarding certain basic assumptions about the object-oriented features in SQL3. Whether X3H2 agrees or disagrees with the positions proposed below, consensus on such assumptions will provide a clearer foundation and rationale for other more specific decisions being made about SQL3, leading to a more precise, unambiguous, and consistent definition of SQL3.
Proposed definition:
An object is an abstract concept which is represented in a system by an identifying reference (id-ref).
I have temporarily introduced the term "id-ref" to sidestep some corollary issues about oid's. It may well be that the final version of this definition will use the term "oid", if it is appropriately defined. My intent is that the term "id-ref" is broad enough to include, for example, identification of instances of ADT's which have no oid's.
Under the assumption presented in
Section 4 below, the definition of "object" should not be couched in terms of a specific data structure, since many data structures can implement the same interface for an object.If a definition of "object" as an <id-ref,value> tuple is being considered, this can lead to difficulties if it is ever possible for two tuples to exist anywhere in the system having the same id-ref but different values. This is discussed further in Section 3.
It would be useful to (a) define the semantics of identity, and (b) clarify whether or not equality is intended to have such semantics.
The proposed semantics of identity are based on the notion of being the same object. I will use x==y to mean that x and y are identical, which should be true if and only if x and y refer to the same object. Certain behavioral characteristics should hold if x==y:
{x,y}=={x}=={y}.
This should hold in any context which does not admit duplicates.
Tuples which have different values in any component are clearly not identical. Also, if x and y refer to tuples of different lengths, then they clearly cannot satisfy either of these criteria (assuming that there is some operation which is sensitive to tuple lengths). Casting should not be confused with identity. If y can be substituted for x in f(x) by transforming y into something else, then y may be compatible with x in some sense, but they do not refer to the same object.
Two questions thus arise:
It may not be possible to enforce such behavior in the definition of EQUALS for an ADT, but it would be appropriate for the standard to so define the correct behavior of the operation.
Proposition: one of the fundamental goals of object orientation is to facilitate portability and reusability of applications. One means of achieving this goal is data independence, making applications as independent as possible of specific implementations. This allows applications to be reused with different implementations, and to be unaffected by changes in implementation. (They may have to be recompiled, but not reprogrammed.)
A corollary to this proposition is that schemas should be organized in a way that clearly distinguishes portions on which applications may depend (i.e., interface specifications) from portions to which they should be indifferent (implementation specifications). Schemas should also be organized so that one or more implementations can be specified for the same interface.
A simple example is given below. It is not in itself a concrete proposal, and it is not intended to display correct SQL syntactic form. It merely illustrates the concept. The example concerns circles for which the radius and diameter are both defined. Different implementations might store one or the other, or both.
CREATE TYPE Circle
FUNCTIONS (
Center -> Point ASSERTABLE;
Radius -> Number ASSERTABLE;
Diameter -> Number ASSERTABLE;)CREATE CLASS Circle1 FOR Circle
FUNCTIONS (
Center AS STORED (format);)CREATE CLASS Circle1a SUBCLASS OF Circle1
FUNCTIONS (
Radius AS STORED (format);
Diameter(x) AS 2*Radius(x),
ASSERTION x,y: Radius(x)<-y/2;)CREATE CLASS Circle1b SUBCLASS OF Circle1
FUNCTIONS (
Radius(x) AS Diameter(x)/2,
ASSERTION x,y: Diameter(x)<-2*y;
Diameter AS STORED (format);)
Under this schema, all instances of the type Circle use the same implementation for Center, hence all circles are instances of the class Circle1. Instances of the subclass Circle1a have their radius stored and diameter computed, while instances of Circle1b have the opposite implementation. Every instance of the type Circle must also be an instance of one of the leaf classes Circle1a or Circle1b.
Application logic is sensitive only to the specification of the type Circle, and not to the specification of any of the classes. The keyword ASSERTABLE tells the application that new values may be assigned, while remaining neutral as to whether and how the values are stored.
SQL3 currently uses the term "subtype" to refer to two distinct relationships between types: inclusion and substitutability. It would be useful to at least clarify the two concepts, and to observe that both concepts are potentially applicable to all objects. This could lead to a proposal to use distinct terminology for the two.
The inclusion relationship, which I will denote t1 SUBSET t2, means that each instance of t1 is itself an instance of t2. Thus, for example, every memo is itself a document, and every employee is a person. This concept applies to non-oid objects as well as oid-objects. Thus the even integers are a subtype of the integers, and the tuple type <Employee,Integer> is a subtype of the tuple type <Person,Integer>. An instance <@sam,1000> of the former is itself an instance of the latter. (Syntax is illustrative only. @sam denotes an oid.)
Substitutability, which I will denote t1<t2, means that an instance of t1 may occur where an instance of t2 is expected. This relationship can be described in terms of a Cast operator
Cast: t1->t2.
If an operation f which is defined to expect an instance of t2 is invoked with an instance x1 of t1, the effect is that
f(x1) = f(Cast(x1)).
Inclusion can be viewed as the limiting case of substitutability, where the casting operation is the identity function Cast(x)=x, so that
t1 SUBSET t2 => t1<t2.
Substitutability is currently defined in SQL3 with respect to non-oid objects. However, it might be possible to define substitutability, and casting, for objects having oid's, in such cases as votes by alternate committee members, replacement of recipe ingredients, or stand-ins for performers.
That raises one concern. SQL3's current use of the term "subtype" to mean inclusion for oid-objects and substitutability for non-oid objects will make it difficult to deal with inclusion for non-oid objects and substitutability for oid objects.
Another concern arises regarding the context of substitutability. Substitutability may not apply to all contexts, and it might even involve different casting operators in different contexts. A long tuple may be substitutable for a shorter tuple in a context which only cares about the first few elements of the tuple, but it might not be substitutable in a context which cares about the length of the tuple.