William Kent, "The Semantics of Aggregate Objects", HPL-DTD-88-4, Hewlett-Packard Laboratories, Oct. 28, 1988. [14 pp]

The Semantics of Aggregate Objects

> 1 INTRODUCTION . . . 2
> 2 GENERAL DEFINITIONS . . . 3
>> 2.1 Membership and Cardinality . . . 3
>> 2.2 Bags and Sets . . . 3
>> 2.3 Lists . . . 4
> 3 ACCESSING AGGREGATES . . . 4
>> 3.1 Cursor Objects and Fetch Functions . . . 4
>> 3.2 Iterate, Sift, and Select . . . 4
> 4 CONSTRUCTING AGGREGATES . . . 4
>> 4.1 Constructs . . . 5
>> 4.2 Constructing Lists . . . 5
>> 4.3 Constructing Bags . . . 5
>> 4.4 Other Aggregate Generators . . . 5
> 5 IDENTITY: CONSTRUCTS AND COLLECTORS . . . 6
>> 5.1 Background and Motivation . . . 6
>>> 5.1.1 Set Theory . . . 6
>>> 5.1.2 Ambiguity of `Tuple' . . . 6
>>> 5.1.3 Long Literals . . . 6
>>> 5.1.4 Terminology . . . 7
>> 5.2 Aggregate Identity is Based on Form and Content . . . 7
>> 5.3 Collector Objects: Updatable Aggregates . . . 8
> 6 TYPES AND CONSTRAINTS . . . 9
>> 6.1 Subtypes of Aggregates . . . 9
>> 6.2 Type Checking . . . 10
>> 6.3 Constraint Expressions . . . 11
>> 6.4 Checking Function Constraints . . . 14
> 7 CONCLUSION . . . 14
> 8 APPENDIX 1: ALTERNATIVES FOR ACCESSING LISTS . . . 14
>> 8.1 Ordinal Access, List As Argument . . . 15
>> 8.2 Ordinal Access, List As Function . . . 15
>> 8.3 Using Accessors, With List As Function . . . 15
>> 8.4 Using Accessors, With List As Argument . . . 16
>> 8.5 Accessors as Variables . . . 16
>> 8.6 Examples . . . 17
>> 8.7 A Comparison . . . 17
> 9 APPENDIX 2: PRIMITIVE LIST AND BAG CONSTRUCTORS . . . 17
> 10 APPENDIX 3: ARRAYS . . . 18
> 11 REFERENCES . . . 19

1 INTRODUCTION

We develop a consistent semantics for aggregate objects, proposed for inclusion in the Iris function/object model of information processing. Specific proposals for syntax at the OM interface and OSQL, and for implementation, remain to be developed.

The classic concept of a set is extended to allow duplicates and ordering. The core concept of identity is also an extension of the set-theoretic view: the identity of an aggregate is determined by its content and its form. Thus aggregates having the same content and form are the same aggregate. In this sense aggregates behave much like literals, in that they are immutable; "update" is interpreted as the replacement of one object by another.

To accommodate the need for aggregate objects whose identity is preserved while their contents change, we introduce "collector" objects. These are really non-literal atomic objects whose "contents" are aggregates. A collector can be updated by replacing its contents with a different aggregate. Operationally, the inconvenience of the distinction is minimized by allowing most functions that apply to aggregates to also apply to the collectors that contain aggregates. The main distinction arises with regard to equality, types, and assignment.

We regard tuples as being the same things as lists. The differences between them arise more as constraints on functions then as a classification of objects.

A type graph incorporating aggregate objects is shown in Figure 1. In this paper I use "aggregate" for the general class, and "bag" and "list" respectively for the unordered and ordered subclasses. Alternative terminology is possible, as indicated in brackets: "bag" for the general concept, "list" or "ordered bag" for one subclass, and "unordered bag" for the other.

2 GENERAL DEFINITIONS

2.1 Membership and Cardinality

The defining characteristic of an aggregate is that it has members, possibly in multiple occurrences. These members may be arbitrary objects, including other aggregates. The infix Membership function x IN g returns the number of occurrences of x in the aggregate g, possibly infinity .

The Card (cardinality) function Card(g) returns the number of elements in g, possibly infinity . We have

Card(g) = SUM[x]x IN g.

An empty aggregate g has Card(g)=0. The null object can occur a definite number of times in an aggregate, being zero times in an empty aggregate.

Object | --------------------------------------- | | Surrogate (Non-Literal) Construct | | ----------------- ------------ | | | | | | | Collector Literal Aggregate[Bag] | | | | | --------------- ------- | | | | BagCollector ListCollector [Unordered]Bag List[Ordered Bag] Figure 1. The type graph.

2.2 Bags and Sets

A bag is an aggregate with no ordering.

A set is a bag b subject to the constraint

x IN b < 2 for all x.

It's not clear that we need to treat sets as being a distinct type.

2.3 Lists

A list is an aggregate ordered in one dimension. It can be accessed by an expression of the form g[i], with i a positive integer, whose value is the object which is the i-th element of the list. g[i] is the null object when i>Card(g); it may be null for other values of i.

The syntactic form g[i] denotes an accessor function, which can be explained in several ways (see Appendix 1 [Section 8]). We trust that this usage of square brackets will not conflict with the bag constructor to be introduced shortly.

3 ACCESSING AGGREGATES

In addition to the list accessor just mentioned, we have some access functions applicable to all aggregates.

3.1 Cursor Objects and Fetch Functions

[If we agree on the behaviors, then a cursor might be a scan, NewCursor might be OpenScan, and Fetch might be Next. Details of this section are negotiable.]

Cursors and fetch functions iterate through the members of an aggregate. We describe their semantic behavior, with no assumptions about implementation.

Cursor is an object type. A cursor object has an associated list and non-negative integer "position". The NewCursor function takes an aggregate g as argument, and returns a cursor whose position value is 0. The list associated with the new cursor has the same cardinality and membership as g. The ordering of the list depends on the nature of g. If g is a bag, the list has an arbitrary system-determined ordering, not necessarily repeatable. If g is a list, its order is preserved.

The Fetch1 function takes a cursor as argument, returns the next element of the list relative to the current position, appropriately adjusting the current position. [Details are negotiable, including what to do at the end of the list.]

The FetchN function takes a cursor and an integer n as arguments, and returns a list containing the next n elements, again making appropriate adjustments to the current position. [Details again negotiable.] Note that FetchN(c,1) differs from Fetch1(c) in that the former returns a list with one element, while the latter returns the element itself.

3.2 Iterate, Sift, and Select

Iterate, Sift, and Select functions are described in a separate paper [K1].

4 CONSTRUCTING AGGREGATES

Aggregates are "created" by constructor functions. They are also returned by other functions.

4.1 Constructs

Strictly speaking, constructors don't create new objects, in the same sense that addition doesn't create new numbers and concatenation doesn't create new character strings. Constructors can yield constructs that were previously constructed, without having created a new object.

In contrast, each act of creating a non-literal creates a new object.

Aggregates, like literals, have an eternal existence. All the numbers exist, all the character strings exist. We only construct references to them. In effect, all aggregates of existing objects always exist; they cease to exist when any of their elements vanish.

In recognition of this common behavior, we introduce Construct as a type, of which Literal and Aggregate are subtypes.

4.2 Constructing Lists

The MakeList function is rarely mentioned by name. The notation <a,b,c> signifies MakeList(a,b,c).

Assembling the list of arguments to MakeList appears to involve infinitely recursive invocation of MakeList. A solution to that problem is described in Appendix 2 [Section 9].

<> denotes the empty list.

4.3 Constructing Bags

In much the same way, we use the notation [a,b,c] to signify MakeBag(a,b,c).

[ ] denotes the empty bag.

4.4 Other Aggregate Generators

Many other functions also return aggregates.

We can define concatenation functions for combining two bags or two lists in an obvious way.

We can define "replacement" functions which return the aggregate obtained by replacing a specified element with a specified object. For example, if l1 is a list, then

Replace(l1[3], c),

which might inaccurately be portrayed as an "assignment"

l1[3] <- c,

returns a list l2 whose elements match those in l1 except possibly for the third element. If l1[3] was already c, then the result equals l1, otherwise it does not.

And so on. Many of these can be thought of as generalizing analogous operations on strings.

An important class of aggregate generators are the aggregate queries, described in [K1]. These can serve to specify the content of complex objects.

5 IDENTITY: CONSTRUCTS AND COLLECTORS

We differentiate between aggregates, which behave much like literals, and "collectors", which are non-literal objects having aggregates as their values. While they share many behaviors, there are important distinctions with regard to identity and update.

5.1 Background and Motivation

5.1.1 Set Theory

With regard to identity and existence, our approach follows set-theoretic principles.

The identity of a set is determined by its contents. Sets are not distinguished by any identity other than their content. It is meaningless to speak of two sets having the same elements; they are one and the same set.

Similarly, all sets of existing things exist. Sets are not created, nor are their contents "updated". One "constructs" sets by operations such as union and intersection, which map existing sets into existing sets.

In some sense, a collector corresponds to an intensional set. It behaves as though there were some unstated reason or criterion for collecting things, where the set of things which satisfy the criterion might vary with time.

5.1.2 Ambiguity of `Tuple'

We sometimes say that a tuple occurs several times in a relation, or that the same tuple occurs in several relations. We also say that a tuple is identified by a "tuple identifier" ("tid"), or that we can update the contents of a tuple.

Let's paraphrase the first notion: "several tuples in a relation might contain the same tuple". It's clear that these usages of "tuple" can't have the same meaning.

We have a "tuple1" that is a given sequence of elements, and which might occur several times in a given relation, or in different relations. It is an aggregate in our sense, behaving much like a literal.

Then we have a "tuple2" that is a row in a relation, identified by a tid, and which might contain different tuple1's at different times. This is a collector object.

A tid identifies a tuple2, which can be updated by replacing its current tuple1 value with a different tuple1 value.

5.1.3 Long Literals

When literals are reasonably short, we tend to assume that they have "copy semantics" rather than "reference semantics", corresponding to "pass by value" vs. "pass by name". If we say that two people are named "John", we don't assume pointers to the same copy of the string "John". Altering the name of one person has no effect on the name of the other. Assigning "John" as someone's name effectively creates a new copy of the string "John".

When literals get longer, we become storage conscious. If two contracts have the same text, we hate to store duplicate copies of that text. We are inclined to optimize by having equal values point, or refer, to a shared copy. But, as we know, updates will now behave differently if we're not careful; altering the text of one contract might unintentionally alter the text of the other.

Aggregates, as we've defined them, should behave like literals, but they tend to be large. The implementation of aggregates has much the same problems as the implementation of long literals, e.g.,

Deciding whether and how to let the user choose between copy semantics and reference semantics.
Deciding when to make separate copies and when to point to shared copies.
If copies are being shared, then introduction of a value involves looking to see if a sharable copy exists and, if not, allocating space for a new one.
If redundant copies are not always avoided, then it becomes much more expensive to determine equality. References to two different locations might refer to the same value.
Garbage collection when no one is using a shared copy.
Deciding whether to optimize by using substrings of existing strings instead of making whole new copies, i.e., taking advantage of the fact that a 400k string is the same as the middle of a 500k string.

Solutions to these problems for long literals and for aggregates should be consistent.

5.1.4 Terminology

"Collector" may not be the most satisfying term for the concept we will define below. We have considered others.

"Container" might be appropriate, in the sense that it contains an aggregate, but we will probably use that term in a slightly different way for containment semantics of complex objects.

A collector is in some sense a "handle" for an aggregate, but that term has another meaning for us, being the internal representative for any object whatsoever.

Collectors also behave much like variables, having values that change with time. One important distinction is that variables can be unbound, reverting to prior values, requiring some memory of prior values.

We can also think of a collector as a "label" for an aggregate, but we probably want to reserve that term for procedures.

"Frame" might also be a useful term, but it has too many other connotations.

"Holder" and "collector" seem to be neutral terms that we could appropriate for this purpose, and for now we will go with the latter.

Any other suggestions?

5.2 Aggregate Identity is Based on Form and Content

Two aggregates are equal (the same object) if and only if they have the same content and form. They must contain the same number of occurrences of the same elements. If one is a list then the other must also be, with the same elements in the same places. I.e., g1=g2 if and only if

x IN g1 = x IN g2 for all x,

and, if g1 is a list, then

g2 is a list,

Card(g1)=Card(g2),

g1[i] = g2[i] for all i.

Note that two lists might have the same elements in the same places and yet have different cardinalities, when some of the elements are null.

The empty bag is distinct from the empty list. Furthermore, these are both distinct from the null object, and they do not contain the null object.

5.3 Collector Objects: Updatable Aggregates

We introduce Collector as a kind of persistent, updatable aggregate. It is not really an aggregate in itself, but its "content" is an aggregate:

Contents: Collector-> Aggregate.

The contents of collectors can be updated:

Contents(c) <- aggregate-valued-expression.

More precisely, collectors are partitioned into subtypes according to the kind of aggregates they may contain. Thus

Contents: BagCollector-> Bag

Contents: ListCollector-> List

It might be convenient to allow Contents(g)=g for aggregates.

Collectors are explicitly created as non-literal (surrogate) objects. Equality of collectors is based on their object identity, rather than on their content. Distinct collectors may have the same content. The expression to test equality based on content is

Contents(c1) = Contents(c2).

We minimize the inconvenience of the distinction by saying that many functions applicable to aggregates are applicable to collectors that contain them. Thus, for such functions, we will say that they can be applied to a collector c under the interpretation

f(c) ::= f(Contents(c)).

Thus, for example,

Card(c) ::= Card(Contents(c)).

If a collector has no contents, it should behave as though it contained an empty bag or list, according to its type.

We will have to identify the functions having this behavior. In general, accessor and size-related functions will behave so.

The most important exception is equality. This predicate does not cascade from collectors to their contents. For collectors c1 and c2

c1=c2 => Contents(c1)=Contents(c2),

but not conversely. Thus their contents may be equal without the collectors being equal.

An equally important exception is the Type function. Type(c) is Collector, while Type(Contents(c)) is Aggregate.

Constructors are another important exception. <c> is not the same as <Contents(c)>.

Note that this is a one-way "inheritance". We didn't say that functions defined for a collector are applicable to their contents. In particular, this excludes assignments, i.e., updates. The following are very different for a collector c:

f(x) <- c,

f(x) <- Contents(c).

We could elevate Collector to be on a parallel with Construct, allowing collectors to have literals as values. That might be especially useful with long literals.

Collectors might also be instances of user types. E.g., a book or a folder or a file could be a collector, with its contents being a bag or list.

Note that such programming language constructs as arrays and stacks correspond to collectors, since their contents get updated without changing identity. Or could we describe their update as replacement? Is the question of identity really relevant?

6 TYPES AND CONSTRAINTS

6.1 Subtypes of Aggregates

There are problems associated with further refinements of the aggregate type structure.

If an atomic object c has multiple types, then [c] has multiple types. For example, [Sam] might be a "Bag of Employee" and "Bag of Person" and "Bag of Parent" and ... Furthermore, the type of [c] changes as the types of c change.

And we have more complexity when the aggregate has multiple elements. A bag is not a "Bag of T" if some members are not T's, but it becomes one when the last holdout becomes a T.

When we have aggregates of aggregates, those problems get messier.

Also, a list could be described as "List of Integer, Person, Integer". With such a convention, can we differentiate between the types "List of Integer, Integer, Integer" and "List of Integer"?

We could incorporate types into the tuple identity concept. Then <person:Sam>, <employee:Sam>, and <Sam> would be distinct objects. We could go a step further and incorporate functional relationships into identity, making the following distinct:

<employee:Sam, dept:Toys | employee=mgr(dept)>

<employee:Sam, dept:Toys | employee=auditor(dept)>.

Even these would be distinct:

<employee:Sam | employee=mgr(Toys)>

<employee:Sam | employee=auditor(Toys)>.

I don't know how far we might want to go with such complexity, or why. Under what conditions are we willing to say that f(x)=g(x), if both yield an ordered pair containing Sam and Toys?

Regarding implementation... We currently maintain the types of literals in their handles, since each has one obvious "primary" type, i.e., its single immediate type. The types of non-literals are maintained in tables, as oid's in the key columns of typing tables and as oid/type pairs in the object directory (will we keep that?). I don't see how to implement aggregates as literals that might have multiple types. What we could do is treat them similarly to literals, being simply of type Bag or List in their handles, and not try to subtype them further.

For now, I won't introduce such subtyping of objects, in the sense of defining "Bag of Integer" as a distinct type object. I don't even know if we want to support it optionally, given the problems outlined above.

6.2 Type Checking

Why do we do type checking? In order to determine, prior to executing (or updating) a function, whether the arguments (and results) satisfy constraints on the function. We'll discuss this in terms of argument checking for a query, but it all applies equally to result constraints for updates.

What is this checking based on? Very often the only knowledge we have about the argument is that it will be the binding of a specified variable, or the result of a nested function. That is, function invocations come in the three forms

f(c)

f(x)

f(g(..))

We speculate that constants, when they do occur, will rarely be aggregates. Most of the time we are trying to determine whether the declared type of a variable, or the result type of a nested function, satisfies a given argument constraint. The specifications of such variables and function results tend to be simpler than the multiple types of actual objects.

Therefore we will explore an approach to type checking that emphasizes the role of variable declarations and function result specifications. Literal arguments will be accommodated, but perhaps not optimally.

Furthermore, our current approach to type checking is simplistic, based essentially on table lookup of type assertions. Typically, we achieve type checking by binding the terms of a function to columns of a table, thereby only admitting the things which occur in those columns.

This will need to be extended to allow more procedural type checking, to accommodate intensional (derived) types as well as more complex constraint expressions on functions. Once we allow a type to be derived, e.g., as

T1(x) ::= predicate-expr(x)

then we might as well allow predicate expressions directly as function constraints. I.e., the function definition

f: T1 -> T2

is the same as

f: predicate-expr(x) -> T2.

The implementation and execution are much the same for the two cases. The difference between specifying "T1" and specifying "predicate-expr" is analogous to the difference between a derived function and a nested query.

6.3 Constraint Expressions

Where does that lead us? Consider the expression

Bag(x) AND y IN x => Integer(y).

This expression is true if x is a bag and every element of x is an integer, i.e., if x is a bag of integers. We can do several things with such an expression.

We can allow it to occur as a function constraint:

f: Bag(x) AND y IN x => Integer(y) -> T2

(perhaps with additional syntax to set off the constraint expression clearly).

We can use the expression to define a derived type:

Create Type Bag_Of_Integer(x) as Bag(x) AND y IN x => Integer(y).

Then the function f could be defined as

f: Bag_Of_Integer -> T2

Or we can introduce special syntax defined to have this meaning. Thus the syntax

f: Bag Of Integer -> T2

would be defined to mean

f: Bag(x) AND y IN x => Integer(y) -> T2

Observe that the argument object doesn't have to have "Bag of Integer" as an explicit type; it only has to satisfy the constraint expression. To put it another way: while an object c might satisfy the constraint "Bag of Integer", that constraint does not exist as a type object, hence is not a value of Type(c).

The term "tuple" can be relevant here to express a particular form of constraint expression, namely a list having a specified length whose elements satisfy a specified sequence of constraints. Thus "Tuple of Integer, Integer" requires a list of two elements, both of which are integers. In contrast, the constraint "List of Integer" will accept a list of any length, so long as each element is an integer.

The construct <2,3> illustrates the distinction between treating "tuple" as a type and as a constraint. This construct satisfies both of the constraints just mentioned. There is no point in forcing a distinction to be made as to whether this construct is a tuple or a list.

To reiterate: we take the tuple concept as being relevant to function constraints, but not to object typing. Anything we might consider to be a tuple is also a list. The notion of being a tuple seems intrinsically associated with the satisfaction of a constraint, rather then being inherent in the nature of an object itself.

We can introduce a system of function constraint specifications in which a constraint C is recursively allowed to be one of the following:

A type T. Note that Bag and List can occur without further constraints. The expression implied by this constraint is

T(x).
"List of C", constraining an object to be a list of any length, each of whose elements satisfies constraint C. This could be denoted by <C..>. (A typographical dilemma: the ".." is not ellipsis here. I mean that a list of integers could be literally specified as "<Integer..>".) The expression implied by this notation is

List(x) AND 0<i=<Card(x)=>C(x[i]).
"Tuple of C1,..,Cn", constraining an object to be a list of cardinality n, whose i-th element satisfies constraint Ci. This could be denoted by <C1,..,Cn>. (Here the ".." is ellipsis. It is to be replaced by a sequence of constraints in a real specification.) The expression implied by this notation is

List(x) AND Card(x)=n AND C1(x[1]) AND .. AND Cn(x[n]).
"Bag of C", constraining an object to be a bag, each of whose members satisfies constraint C. This could be denoted by [C]. The expression implied by this notation is

Bag(x) AND y IN x => C(y).
"Set of C", constraining an object to be a set (bag with no duplicates), each of whose members satisfies constraint C. This could be denoted by {C}. The expression implied by this notation is

Bag(x) AND y IN x<2 AND y IN x => C(y).
An arbitrary logical expression.

The arbitrary logical expression could be used to admit null values, e.g.,

f: T1 -> <T2, T3|Null>,

indicating that f might return a list having null in the second element. The same form could also be used for unions of types. An expression could also, for example, constrain a function to be applicable to people over 21:

f: Person(x) AND Age(x)>21 -> T.

External syntaxes could also support such conventions as:

"T1,..,Tn" to mean "Tuple of T1,..,Tn", i.e., "<T1,..,Tn>". This risks an ambiguity: should a single type T be interpreted as <T>?
For result specifications, "C many" could denote "Bag of C".

Note that we are assuming one argument constraint and one result constraint per function, i.e., arguments and results are single objects. The "x" in the examples above refers to a single argument or result object. We will have other conventions that allow variables to be bound to elements of a list argument or result.

6.4 Checking Function Constraints

The general problem: based on what we know about the argument (or result), can we conclude that the function constraint is satisfied? Thus, in its general form, it is an inferencing problem. Type checking is a simple special case of such constraint checking, since subtyping is connected with a membership implication.

What we know about the argument is one of the following:

Declaration of a variable.
Result constraint of a nested function.
Types (and other properties) of an actual parameter.

Declarations of variables could have the same form as function constraints, denoting constraints on the kinds of objects to which they may be bound.

Constraint checking is more difficult with function overloading. If a nested function is overloaded, its result constraint may in turn depend on what is known about its arguments. We have alternative chains of inference, possibly unresolvable under early binding. That problem can be avoided under a restricted form of overloading, requiring all overloads of the same function to have the same result constraints.

7 CONCLUSION

We have described a treatment of lists and bags suitable for a functional approach to object management.

8 APPENDIX 1: ALTERNATIVES FOR ACCESSING LISTS

The various usages of lists as arrays and as tuples suggest several possible accessing mechanisms, i.e., several possible explanations of the notation t[i].

We explore five paradigms for accessing the elements of ordered lists in a functional language. Four of the approaches fall into a two-by-two matrix. Elements are accessed either by ordinal position or by "accessors"; the list itself can serve either as a function or as an argument in the accessing technique.

In the fifth paradigm, accessors are variables.

Let t be the list

<Sam, Sales, 33>

In the following examples, t might be a list-valued variable or a list-valued function call, including the list constructor function.

8.1 Ordinal Access, List As Argument

We could introduce an Nth function such that Nth(t,i) returns the i-th element of list t, i.e.,

Nth(t,2) = Sales.

Here,

t[i] ::= Nth(t,i).

8.2 Ordinal Access, List As Function

We could say that the list itself is a function, taking a positive integer as argument and returning the corresponding element, so that t(i) returns the i-th element of list t, i.e.,

t(2) = Sales.

Here,

t[i] ::= t(i).

8.3 Using Accessors, With List As Function

Assume a context in which the list t is described by

<Person a, Dept b, Age c>,

where a, b, and c are "accessors".

We could interpret the list t as a function, taking accessors as arguments, such that t(b) returns the second element of t, i.e.,

t(b) = Sales.

Here,

t[b] ::= t(b).

If the accessors a, b, and c could be "defined" as constants (or variables) with integer values, then we could merge two approaches, both using the list as a function:

t(b) = t(2) = Sales.

Otherwise it is not clear what sort of objects the accessors would be under this approach.

8.4 Using Accessors, With List As Argument

Assuming the same context, the accessors could be interpreted as functions, such that b(t) returns the second element of t, i.e.,

b(t) = Sales.

Here,

t[b] ::= b(t).

8.5 Accessors as Variables

Still assuming the same context, we could say that b itself has the value Sam:

b = Sam.

Here the accessor b is behaving like a variable bound to the second element of the current list. Thus

t[b] ::= b.

The same syntax could also support an earlier paradigm, treating the accessor b as a function, under the convention that b itself is an abbreviation for the function call b(t).

8.6 Examples

Create function PairSum: <Real a, Real b> t -> Real...

...as Nth(t,1)+Nth(t,2);

...as t(1)+t(2);

...as t(a)+t(b);

...as a(t)+b(t);

...as a+b;

8.7 A Comparison

The form b(t) has the advantage of looking like a function applied to a list to extract an element, but that may also be a disadvantage since we then have to treat accessors as functions, which can get complicated. We have to be clear about their argument types, the scope of their definitions (overloading?), and their lifetimes.

The form t(b) is simpler in the sense that b only needs to be treated as an integer-valued constant or variable, and this approach can be unified with the familiar array-style reference t(i).

The form using b itself as a variable is also attractive in its simplicity.

It is not yet obvious whether any of these is clearly preferable. We might speculate that

t[i] ::= Nth(t,i)

constitutes the simplest explanation.

9 APPENDIX 2: PRIMITIVE LIST AND BAG CONSTRUCTORS

We'd like to model the MakeList function in a way that doesn't recursively require a list constructor to assemble its arguments.

There's probably no harm in specifying one primitive function that takes multiple arguments, like Cons in Lisp. But there is an alternative: self-extending chains. This is just a formal nicety; this section can be skipped if you are happy to accept <a,b,c> as a list with no further explanation.

A chain is a function which, applied to any argument, yields another chain with that argument appended. There is an "empty chain" object which we can call NewChain.

Tentatively, we say that the constructor <c> is an abbreviation for NewChain(c). The constructor <c,d> is an abbreviation for NewChain(c)(d), i.e., the chain we got by applying NewChain to c is applied to d to get another chain.

A small ambiguity would arise if we choose to treat a list t as a function such that t(i) returns the i-th element of the list. We have to explicitly convert chains to lists, such that <c,d> means MakeList(NewChain(c)(d)). The reason is that if c is a chain, then c(i) appends i to the end of the chain rather than returning the i-th element. Again, this point is relatively obscure and not terribly important.

Cardinalities are defined by

Card(NewChain) = 0,

Card(c(d)) = Card(c)+1 for any chain c.

Also, the i-th element of c(d) is d for i=Card(c)+1.

Bag construction is straightforward. We can say that MakeBag really takes a list as argument, so that

[a,b,c] ::= MakeBag(a,b,c) ::= MakeBag(<a,b,c>).

The net effect is to remove the ordering from <a,b,c>.

It's useful to let list construction be the primitive on which bag construction depends, rather than vice versa, since it would otherwise be difficult to re-introduce the ordering naturally introduced by constructing lists element-by-element.

10 APPENDIX 3: ARRAYS

An array is an aggregate ordered in one or more dimensions.

We could choose to model multi-dimensional arrays as lists of lists. Thus, for example, a two-dimensional array could be modelled as a list of lists, and reference to the [i,j]-th member of g could be modelled as g[i][j]. Some consequences of this approach:

There is an asymmetry to the dimensions, with a clear sense of "outer" and "inner".
Cardinality and membership concepts do not apply to members in all dimensions of the array.

The whole notion of multi-dimensional arrays could be deferred. We could initially only be concerned with one-dimensional arrays, i.e., lists. However, for the sake of completeness, we include a description of multi-dimensional arrays, in a way that allows lists to be considered as one-dimensional arrays.

Object | --------------------------------- | | Surrogate (Non-Literal) Construct | | ----------------- ------------ | | | | | | | Collector Literal Aggregate[Bag] : | | | | : --------- (similar subclasses) | | [Unordered]Bag Array | ---------- | | [Ordered Bag]List Multi-Dim Array

The Dimension function maps an aggregate into a non-negative integer. Its value is n>0 for an n-dimensional array, 1 for a list, and 0 for a bag. We could also allow it to be 0 for a non-aggregate object.

The Size function applied to an n-dimensional aggregate g returns a list of n integers. If we want infinite aggregates, then we can allow \(if, the infinity object, to occur in a size. The size is established when the aggregate is constructed.

If Size(g)=<i1,..,in>, n>0, then Card(g)=i1+...+in.

Members of an n-dimensional array are accessed by an expression of the form g[i1,..,in], ij>0. If any ij in i1,..,in is larger than the j-th element in Size(g), then g[i1,..,in] is the null object. Such subscripts refer "outside" g. The null object may also occur "inside" g.

Cursor and fetch operations are applicable to arrays. NewCursor generates a list of the elements of the array, in a manner that would have to be defined.

Equality of arrays requires them to have the same dimension and size, as well as having the same elements in the same places.

11 REFERENCES

[K1] Kent, W., "A Generalized Select Function", HPL-DTD-88-5, October 28, 1988.