William Kent
Database Technology Department
Hewlett-Packard Laboratories
Palo Alto, California
Nov. 1993
SQL3 Discussion Paper
ANSI Number: X3H2-93-368R1
November 17, 1993
Title: Implementation Hiding
Author: William Kent
1 X3H2-93-359R/ISO DBL MUN-003, (ISO-ANSI Working Draft) Database Language SQL (SQL3), Jim Melton (ed.), August 1993.
2 X3H2-93-076, "SQL3 OO Underlying Assumptions", by William Kent.
[html]3 X3H2-93-109, "Identity and Equality", by William Kent and Amelia Carlson. [html]
4 X3H2-93-234, "POINTS TO and CONTAINS", May 1 1993, by David Beech, Boris Burshteyn and Phil Shaw.
5 X3H2-93-384R, "Extents for object ADTs", Sept 10 1993, by John Bellemore, Tim Nguyen, Gray Clossman and Phil Shaw.
Papers [4,5] mention "the tradition of SQL as being more of an `end-user' language than a `system programmer' language". Such a language depends on, among other things, the object-oriented principle of implementation hiding. The present paper seeks consensus on the extent to which this principle guides the development of SQL3. The material is presented as a proposed addition to the concepts described in [1]. Committee discussions of the draft proposed below will serve to clarify areas of consensus. Some of the concepts mentioned herein may be reflected in current language facilities, while others might constitute language opportunities.
Add the following new subclause to Clause 4 (Concepts) in [1].
To enhance portability and reusability, applications should be insulated from the effects of changing implementations as much as possible. As reflected in the definition of "abstract data type (ADT)" in Subclause 3.1.3a, object orientation promotes such insulation via "the separation of the interface of the type from its implementation". The definition of "implementation (of an ADT)" in Subclause 3.1.3u goes on to say that "Stored data together with the data structures and code that implement the behavior of an ADT is its implementation."
"Implementation" can be interpreted here as those specifications which can be altered without altering the correctness of an application. Changing such specifications might require recompilation of the application, but no reprogramming.
This proposition has the following implications for a schema and a data definition language:
While it is certainly desirable for a user to have confidence in the correctness of an implementation, it is difficult to guarantee such correctness. Any assurances of correctness should not compromise these principles.
There is an important question of how an implementation is chosen when alternative implementations are available, especially for newly-created objects. Ideally, the choice should be made outside the application, perhaps by some defaulting or context-dependent mechanism. For example, the local copy of the schema might only specify one implementation. Alternatively, the implementation might be chosen on the basis of which machine or operating system the application is running on, or it might depend on which application invoked the given application.
To the extent that an application does choose the implementation and/or it makes decisions based on the implementation, the portability and reusability of the application are compromised.
Precise interpretation of the general principle of implementation hiding can be guided by corollaries such as the followingº
OID formats are a matter of implementation, and not part of the language standard, except perhaps at a low level comparable to standardizing the exact bit pattern representation of character strings. Whether any useful information happens to be embedded in the oid rather than in a separate table should be left to the implementers.
[Acceptance of this corollary should be accompanied by a change to the definition of object identifier type in Subclause 4.9, removing any reference to type information being contained in an OID.]
The essential purpose of private attributes is to describe data which is internal to the implementation, and should not be exposed to users. In particular, applications should in no way be dependent on private attributes. By extension, an application should even be insensitive to changes in the configuration of such private data. In particular, an application should not initialize private data.
Typical examples of private data (in non-relational terms) include such things as:
Internal representations of data visible to applications should be initialized in terms of the visible data. Thus if age is visible while birthday is hidden, then the application should initialize age, not birthday. If this is not really invertible, the counter-argument is that this is not a realistic example.
Applications don't have any need to refer to the names of attributes. They only need to use the retrieval and update operations (observers and mutators). There is no logical difference between an attribute name and its observer operation. Thus it is sufficient to specify that a mutator updates an observer, such as
Set_Age UPDATES Get_Age
Move UPDATES Locate
without specifying an attribute name. Note that naming conventions do not require the attribute name to be part of the mutator or observer operation names.
Suppose an interface for circular objects is defined with members for Center, Radius, and Diameter. It should be possible for different implementations to store either the radius or the diameter, or both. Updates to either can be propagated into updates to whichever ones are stored. An application should be able to function with a single interface to any of these implementations, and even survive a change of implementations with at most a recompilation.
It would be appropriate for the interface to document relevant semantic constraints, such as the fact that the diameter is twice the radius. However, implementation characteristics, such as which attributes are stored and which are virtual, should not be part of the interface specification.
As another example (illustrated in section 1.6 of [3]), geometric points should be defined as abstract points which might be implemented by storing either polar or rectangular coordinates. The properties of a point include x and y coordinates and also a magnitude and angle. It should be possible to designate a point using either polar or rectangular coordinates (i.e., two designators), independently of how the point is stored in any particular implementation.
There are many ways to configure the "real" tables which implement the semantics of sub- and supertables, as well as the extent tables for sub- and supertypes. It should be possible for an application to operate with any of these configurations, and to even allow the underlying configurations to be changed without impacting the logic of the application (recompilation might be required, but not reprogramming).
We can illustrate some possible configurations using two example types/tables: Person(Name,Birthplace) and Student(Major,Credits). Let :pam be a person who is not a student and :sam be a student, hence also a person.
PERSON -------------------------------------------------------------- | oid | Name | Birthplace | Student | Major | Credits | |============================================================| | :pam | Pamela | Pittsburgh | N | | | | :sam | Samuel | Sacramento | Y | Sociology | 77 | --------------------------------------------------------------
PERSON STUDENT ------------------------------ ------------------------------ | oid | Name | Birthplace | | oid | Major | Credits | |============================| |============================| | :pam | Pamela | Pittsburgh | | :sam | Sociology | 77 | | :sam | Samuel | Sacramento | ------------------------------ ------------------------------
PERSON ------------------------------ | oid | Name | Birthplace | |============================= | :pam | Pamela | Pittsburgh | ------------------------------ STUDENT ---------------------------------------------------- | oid | Name | Birthplace | Major | Credits | |==================================================| | :sam | Samuel | Sacramento | Sociology | 77 | ----------------------------------------------------
PERSON ------------------------------ | oid | Name | Birthplace | |============================= | :pam | Pamela | Pittsburgh | ------------------------------ STUDENT ---------------------------------------------------- | oid | Name | Birthplace | Major | Credits | |==================================================| | :sam | Samuel | Sacramento | Sociology | 77 | ---------------------------------------------------- STUDENT_TEMP ---------------------------------------------------- | oid | Name | Birthplace | Major | Credits | |==================================================| | :tom | Thomas | Toledo | Tennis | 22 | ----------------------------------------------------
As an extension of the previous point, it should be optionally possible to restrict access to extent tables for types so that they can only be manipulated from within the bodies of constructor, destructor, observer, and mutator routines. It would thus be possible to configure the schema so that applications do not directly access such tables. Changes in implementation would be reflected in changes within these routines, without otherwise affecting applications exploiting this option.
An object may be mentioned in many places in many tables and/or variables, under the same or different types. It should not matter to an application which place, or how many places, actually store the data about the object. Such definition should be part of the implementation specification, which may in the future even provide for replica management.