William Kent, "An Overview of the Versioning Problem", SIGMOD, Portland Oregon, May 1989 [panel intro]. Also HPL-SAL-88-7, Hewlett-Packard Laboratories, Oct. 21, 1988. [3 pp]
Oct 21, 1988
> 1 INTRODUCTION . . . 1
> 2 GENERAL SEMANTICS . . . 2
>> 2.1 What is Versioning? . . . 2
>> 2.2 What Should Be Versioned? . . . 2
>> 2.3 What Are Versions? . . . 2
>> 2.4 How Are Versions Used? . . . 2
>> 2.5 Update . . . 3
>> 2.6 Versioning and Types . . . 3
> 3 DEPENDENCE ON OBJECT MODEL . . . 3
>> 3.1 The Concept of State . . . 3
> 4 PHYSICAL MANAGEMENT . . . 4
> 5 PRACTICE . . . 4
We briefly survey the main issues in version management, particularly as it relates to object-oriented databases. This is neither a tutorial nor an in-depth analysis, but rather a framework in which those familiar with the subject might organize their approach to the problem. The material can serve as the basis for a workshop or panel, as it has done already (Oregon Database Forum: Workshop on CAE Data Management, Nov. 3-4, 1987; 1988 Workshop on Versioning in CAE Data Management, April 7-8, 1988), and perhaps as a basis for future papers.
The issues are grouped under general headings, though some of the classifications are rather arbitrary.
How is versioning distinguished from time-dependent or otherwise parameterized data? How is it related to temporal databases?
A database might record:
What's the difference?
Should query semantics be any different? Could either form respond to the query
"How many adders were in the design of the arithmetic board as of May 15"?
Many things have properties that vary with time, or with respect to other parameters. Does it make sense to version such things as employees, departments, projects, schedules, inventories, documents, etc.? What are the criteria?
Are versions objects? First-class? If we have three versions of a chip, how many instances of the type Chip are we talking about? (Reasonable answers include one, three, and four.)
How are versions identified/labelled? System-generated or user-defined? Is a time-stamping mechanism required? What about multi-level structures? E.g., software products often have modification levels of releases of versions.
What do version histories/graphs signify? Do forks (parallel paths) in a graph require labels? How are path labels used? If a graph can have parallel paths, can it have multiple roots? How is version deletion managed? How do versions get integrated at joins in the graph?
Version management includes a defaulting mechanism whereby reference to an object is automatically interpreted as a reference to a default version. What determines this default? Is it always the latest version? Along a preferred path in the version graph? Excluding versions currently in development? Can defaults be context-dependent, e.g., different defaults for different people, departments, projects, shifts, etc.?
How do we distinguish references to a default version, a specified version, or "any" version, both in queries and as properties of other objects? E.g., how do we differentiate between updating the price of a chip for all versions, a specified version, or the default version?
How is update managed? Checkout/checkin? Only? Automatic generation of new versions on update of "sensitive" properties? Do all state changes yield new versions?
Are checked-out versions still "in" the database? Updated through normal database operations? Subject to normal constraints? Accessible for retrieval, querying?
How are multiple "in-progress" versions checked in for integration at a join in the version graph?
Are all versions updatable, or are some "frozen"? Can frozen versions be defaults?
Is it necessary to pre-declare an object type versioned? Are all instances of a type versioned (can we have some chips versioned and others not)? Can we start to maintain versions of any arbitrary pre-existing object?
What special problems arise with "unconventional" data, e.g., graphics, sound tracks?
The behaviors considered so far apply largely to versioned objects as "blobs", without considerations of internal structure. Other questions arise which depend on the nature of the object model within which versioning is supported.
Versioning is generally associated with a change in the state of an object. But what constitutes the "state" of an object?
Relationships. Are relationships part of the "state" of some or all of the objects being related? Suppose that the relationship between chips and boards was maintained as a "where-used" property "in" the chips. Removing a chip from a board changes the where-used property in the chip, perhaps implying a new version of the chip. Does it imply a new version of the board?
Does the object model support complex-valued properties (n-ary relationships), such as chip designers by date; chip-gates by date and speed; compiler modules by operating system. How are such parameterized properties differentiated from versioning?
Multiple views. Does the object model support different "views" of the "same" object, such as circuit and layout representations of a board? Different users "see" different properties; are they looking at the same object? When a new version of a layout is created by moving a component to another position, does the user of the circuit representation see a new version? Do properties which refer to that version of the chip have to be updated?
Can versions of the circuit and layout representations be maintained independently, even if they are the "same object"? If so, how are their version graphs correlated and kept consistent? How are object identities maintained?
Sub-objects. How do we manage versioned objects having versioned sub-objects? How do their version graphs inter-relate? Is a new version of an object required when a new version of a sub-object is created? Any problems arising from shared sub-objects?
Multiple inheritance. If an object is an instance of several types, what defines the scope of its versioned state? What if there are different versioning disciplines defined for the different types?
Can different versions of an object be instances of different types? Can one version of a chip be an instance of Product, while another version is an instance of Experimental-Device?
Does versioning provide yet another inheritance path? E.g., all versions of a chip "inherit" its part number. How is this defined? Updated? Related to the question of version-dependent properties (below).
Scopes of properties. Can/should the object model support different scopes of properties with respect to versioning?
Are the properties relevant to versioning only those that are defined to be "in" the object? (See prior discussion of relationships.)
Version-dependence: while some properties of an object are variable from version to version, can other properties be made uniform over all versions?
Version-sensitivity: does a change in properties always imply creation of a new version? Are there some properties which can be updated within a version without forcing a new version?
How are these scopes defined? Is it uniform over a type, or variable by instance?
As examples, consider the following properties of a chip: price, delivery date, part number, name, designer, where-used, operating temperature range, test and measurement results, ...
Query semantics. How do query semantics relate to versioning? Can we ask "how many adders were in the design of the arithmetic board as of May 15"?
Some version management systems only maintain version-related information in the database (such as version graphs), with the actual versions being externally maintained, perhaps in a file system. Others integrate management of the versioned objects themselves. What are the implications, advantages, disadvantages for various applications?
What are the merits of various techniques for optimizing physical storage and performance (e.g., deltas)?
What are the implications of versioning for locking, transaction protocols?
What are the application areas that require versioning support? Electronic design, mechanical design, software engineering, document management, and what else?
How similar/different are the requirements in these areas?
Of the various considerations outlined above, what practical subset can/should be implemented to support one of these areas? To provide generalized support across many/all areas?