Fundamental Concepts

William Kent
Database Technology Department
Hewlett-Packard Laboratories
Palo Alto, California

April 1989

> 1 INTRODUCTION . . . 2
>> 1.1 Context . . . 2
>> 1.2 Abstraction . . . 3
>> 1.3 Frames of Reference . . . 4
>> 1.4 How It Looks . . . 4
>> 1.5 Stimuli: Functions or Messages . . . 5
>> 1.6 The Information Processing System . . . 6
>>> 1.6.1 Functions in a Graph . . . 6
>>> 1.6.2 Linear Symbol Processing Systems . . . 7
>> 1.7 Representing Objects . . . 8
> 2 KEYNOTES . . . 8
> 3 NOTES . . . 8
>> 3.1 Reference . . . 8
>> 3.2 Singularity . . . 9
>> 3.3 Crossing Levels . . . 9
>> 3.4 Tokens . . . 9
>> 3.5 Function Invocations . . . 10
>> 3.6 Types . . . 11
>> 3.7 Rock Bottom . . . 11
>> 3.8 Rules . . . 11
>> 3.9 Symbol:Token = Set:Bag . . . 12
>> 3.10 Miscellany . . . 12
>> 3.11 Frames of Reference . . . 12
>> 3.12 Function Bodies . . . 12
>> 3.13 Schemas . . . 13

1 INTRODUCTION

What is an object?

We won't answer that right away, but talk a bit about how we will answer.

1.1 Context

To begin with, our context is limited. Otherwise, the problem might simply boil down to choosing among the definitions provided in any good dictionary. What a field day we could have with the recently published second edition of the Oxford English Dictionary! We could argue endlessly about the relative merits of the definitions, and about how to interpret them: is the sky an object? are clouds? is green? is two? is the Hewlett-Packard Corporation? or the United States of America? am I as an employee and I as a parent the same object? am I the man I used to be? how many objects will fit on the head of a pin?

Establishing a goal puts some bounds on this sort of philosophical wrangling. Our question arises in the limited context of information processing systems. We seek a useful metaphor for the content and behavior of such systems, at some level of abstraction which lets us use these systems intelligently without bogging down in their internal details. Whatever we say about objects is necessarily conditioned by what we assume about the information system, including the level of abstraction we choose to take.

Information systems contain information about things. The subjects of such information might be outside the system (employees, departments, projects, bridges, engines, circuit boards), or they might be inside (files, directories, programs, sessions, transactions, queues, arrays, records). (Sometimes even this distinction is hard to make. Is the workstation in the information system, or vice versa? How about the network? Is a contract, or memo, or book in the system? It may turn our that this distinction doesn't matter either.)

Thus it seems that what we have in an information system is sometimes the object itself, sometimes just a representative (surrogate) for the object, and sometimes both. If we choose to think of the surrogates themselves as objects (and why not?), then we can be uniformly talking about objects all the time; it is the role of some objects to serve as surrogates for others.

Thus we have two general sources for semantic concepts. Since whatever we do ultimately has to be done with objects in the system, we are somewhat constrained by the properties of such objects. But we also need to be influenced by the behaviors of external objects that we wish to model. This duality is reflected sometimes in where we draw the line between abstract semantics and implementation. If a behavior of an external object does not correspond to the behavior of computer constructs, then we say the external behavior is abstract semantics which can be realized by a variety of computer constructs; choosing among them is implementation. The interesting question is where we draw the line.

We don't really know what's going on inside such systems, but we adopt metaphors as a basis for explanation. Such metaphors also abstract away differences between the details of the systems. Our metaphors will be largely behavioral; we don't know what's inside, but only what comes out in response to stimuli we put in. Thus we will not speak so much of the structural form of objects as what responses we get to messages (functions). (We will have a structural metaphor for the information content of the system, but not for the structure of an object itself.)

1.2 Abstraction

Our level of abstraction will vary, according to our needs of the moment. Sometimes distinctions are important, sometimes we need to avoid getting bogged down in subtleties. Human beings adjust their needs all the time as they communicate.

For instance, we tend to treat an information system as though it provided a faithful replica of some portion of the real world. Once in a while we need to realize that maintaining information is a slightly different matter. It's a bit closer to what goes on in your mind than in the state of reality outside. Trivially, the information might be incorrect, such as having the wrong birthdate for someone. Often, the information might be about something that doesn't "exist", because it is in the past, or fictional, or speculative. Shakespeare, Hamlet, and the next winner of the Pulitzer prize can all exist comfortably in the information system. And we tolerate certain omissions or other discrepancies between "reality" and the information. When talking about the real world, we believe that all real people have birthdays; a rule for an information system might forbid anyone to have more than one birthday, but allow a real person to have no (known) birthday.

How we talk about objects also gets adjusted to different levels of abstraction in different situations.

If you ask a librarian for "Gone With The Wind", you don't expect the librarian to ask which copy you want. But if you ask how much the library paid for "Gone With The Wind", the librarian may answer in terms of different prices paid at different times for different copies, perhaps for different editions and forms. Now go ask MGM (or whoever did it) how much they paid for "Gone With The Wind", and you will get a figure that nobody in his right mind would pay for a copy of a book in a bookstore. If you ask someone else whether they've seen "Gone With The Wind", the context may or may not establish that you are looking for a missing copy of the book. Are we talking about the same thing when I ask who wrote "Gone With The Wind" and how much the book weighs?

These conversations involve the book "Gone With The Wind", yet in many of these contexts we are talking about different things. Our minds adjust automatically, and it is a source of frustration (or comedy) when someone perversely corrupts the adjustment. How would you react if the librarian did ask you which copy you wanted? And would you tell the MGM executive they were crazy to spend that kind of money when you could buy "Gone With The Wind" at any bookstore for under twenty dollars?

Sometimes we cross levels of abstraction intentionally, though perhaps unconsciously. We speak of "signing a contract", when in fact it is a copy of the contract which gets signed. When someone asks "is this your signature", it might be appropriate or annoying to reply "no, it's a copy of my signature".

Similar complexities arise in other aspects. We sometimes want to speak of an operation in a gross sense, such as getting a book. Sometimes we care about more detail, such as purchasing or borrowing, or what to do if the book is already loaned out, or what to do if two people physically grab the last copy on the shelf. Sometimes it matters, sometimes it doesn't.

Or, it may have been presumptuous of me to assume that the librarian had to choose from a set of copies on the shelf; perhaps the request is fulfilled by running off another copy on demand (not unreasonable for smaller documents, such as tax forms).

There is also the matter of the identity of an object. It's simple to say that a book is a body of text, and we sometimes get annoyed with more arcane subtleties. But the body of text often evolves, yet it is the same book, hence the book is not the same as the text. Or we might have two contracts which happen to have the same text, yet we insist they are different contracts. Or the text might not even exist: people are commissioned to write books, or draft contracts, which necessarily "exist" (so we can talk about them) even without their text. So, the book or contract may have to be a distinct object, such that the text is just one of its properties that may or may not exist.

For a change of scenery, let's think about chess. Where is my king? As a chess piece, it may be on square K1; as a physical object, it's on the table. (Or is it on the chess board which in turn is on the table?) If I move the chess board, have I moved the king? If you and I are playing chess by mail, with each of us tracking the game on our own boards, how many kings do I have? What does "my king" mean?

1.3 Frames of Reference

Do you feel annoyed by such questions? Why?

All of these are analogous to real problems in our computer systems, such as virtual address spaces and copies of objects. If I remap a mapped file (or segment of virtual memory) to a different physical address, are my objects still at the same addresses?

Many of these problems arise from differences in our frames of reference, or the set of assumptions we are making. Yours may be different from mine; either one of ours may be changing from moment to moment.

We need a common frame of reference in order to communicate, but if we kept re-checking our assumptions we'd never talk about anything else. Maybe that's what annoys us sometimes; we suddenly realize we're out of sync, and don't know how much of the previous dialog has to be undone, and it's going to take some unexpected work to get back in sync.

We are normally oblivious to the frame of reference until we suspect a communication failure, and then re-sync our frames of reference. It happens naturally in human dialog, but not that often. It seems to be more necessary in this environment, for some reason. That gives me a problem. If I recited all the underlying assumptions at the outset, you wouldn't know why and you'd quickly lose interest. So I'm trying to strike a balance between being very precise at the outset and waiting until the need arises before making subtle distinctions.

We are dealing with information systems which are not yet smart enough to make these context adjustments as casually as we do. Maybe we should try to model such automatic context shifts, but it's hard. By and large, we are trying to develop a single, static object concept which serves multiple contexts simultaneously. As people, we tend to approach the problem thinking of one context or another, within which things are relatively simple, and we get frustrated when the merging of these contexts into a single system unveils levels of complexity we hadn't expected and don't want to deal with.

We need an approach which is capable of making all such necessary distinctions, but also hiding them whenever we can. That's a challenge. Most of the time our point of view here will be that of the omniscient expert, who has to be aware of the needs of multiple contexts simultaneously. It is a separate challenge to be friendly to users in any one particular context.

The same thing applies to this exposition. On the one hand, it is founded on some simple and elegant concepts, such as the application of functions to objects. Yet to be useful, we have to account for a myriad of technical details in how the functions and objects are designated. We will pass different people's comfort levels at different points as we get increasingly precise and detailed about these notions.

1.4 How It Looks

Knowledge of objects involves both form and behavior. The form, or state, of an object is often perceived as a discrete chunk of storable data that belongs to the object, organized as variables whose values are properties of the object. Thus the "state" of Dick might include the facts that his name is "Dick", his wife is Jane, and he was born in 1960. Such properties correspond to facts which can only be known by assertion, i.e., because someone says so, and can't be deduced from other information. Other properties might be defined by algorithms, such as the derivation of a person's age from his birthdate.

In a sense, this view mixes semantics and implementation. For efficiency reasons, we might wish to store a person's age instead of having to compute it every time (we remember people's ages), so long as we take care to update it once a year. A semantic specification might say that one or both of Birthdate and Age are assertable properties, and describe how they are related. It would be up to the implementation to decide whether to store one or the other, or both, and how to manage changes in values. Furthermore, these implementation decisions could be changed without affecting the semantics of the object.

The notion that state belongs to an object constrains the treatment of shared information, i.e., relationships. That fact that Dick and Jane are married might be expressed as a Wife variable in Dick's object, a Husband variable in Jane's object, or both. If I want to know if Dick and Jane are married, I have to think about whether to ask Dick or Jane; I can't just ask. If they get divorced, I have to decide whether to change Dick's object or Jane's object, or both; I can't just announce the divorce.

Alternatively, we could allow state to be shared among objects, again leaving the exact mechanism to be chosen in implementation. A more abstract model might specify families of propositions which are simultaneously true or false, such as

Married(x,y)
Husband(x)=y
Wife(y)=x.

It would again be up to the implementation to decide which and how many of these to maintain as stored data, how to derive one from another if necessary, and how to keep stored values synchronized.

The "state" of an object clearly requires some memory of the current values of assertable properties; the structure of such memory need not be part of the semantics of the object, but left to implementation.

It is likely that the states of many objects will be interlocked with each other by networks of relationships. It might be useful to turn our mental model of state inside out. Instead of state being in an object, we can say that the information system as a whole has a state in which objects are embedded. The state of any object is some subset of the whole state, but the states of individual objects often overlap. The image starts to look more like overlapping envelopes in a graph.

1.5 Stimuli: Functions or Messages

Are we ready to talk about objects? Well, not directly, not just yet. We might never get around to saying what an object "looks like". We sometimes see images on screens and printouts, but these are not the things that exist in the system, though it is sometimes convenient to think so. As a rule, we often get different sorts of images as manifestations of the same object.

We might not even say what an object "contains", though there are ways to make objects behave as though they contained things, and that's also a convenient way to think about objects, sometimes.

We will talk in terms of a behavioral metaphor of what goes on inside the information processing system, based on the application of functions to objects. The most definitive thing we are prepared to say at the outset is that objects are things to which functions can be applied. (This may eventually lead to a circular definition of function, and perhaps recognition that "function application" might be a more fundamental concept.) Some people prefer to describe this as sending messages to objects, but the function metaphor is more general in that it allows multiple arguments to be involved. If we want to know Dick and Jane's children, we don't have to decide whether to ask Dick or Jane; we just ask.

What little we've said so far about objects would seem applicable to things like forty-nine and "XYZ". Do we intend those things to be objects? That's certainly debatable; whatever the outcome, we will later develop means of differentiating them from other kinds of things, yet treating them all uniformly when appropriate.

We are saying that functions and objects exist in the system, and things happen by applying functions to objects. Such an application can return resulting objects, which is how we obtain information about things. Such an application can also change the internal state of the system, which is how we manipulate information.

1.6 The Information Processing System

This too comes in varying levels of abstraction. What we say about objects depends in part on what level of abstraction we are assuming for the information processing system.

Our highest level of abstraction will be in terms of a graph metaphor.

1.6.1 Functions in a Graph

The static content of an information system can be visualized as a labelled directed graph, in which each object occupies exactly one node. The edges are labelled with functions. An edge from object x to object y labelled with the function f means that the function f applied to the argument x returns the result y. (This metaphor gets clumsy when we try to deal with functions having zero or multiple arguments or results.)

Functions play a pivotal role. They label edges, but they are also objects in their own right, hence occupy nodes in the graph. In terms of our graph metaphor, it is almost as though there were little threads (dotted lines) from a function node to the edges labelled with that function; these threads are not edges in the graph. (If that's confusing, then maybe the graph metaphor is getting in the way.)

Some functions are "assertable"; there is a mechanism for assigning and reassigning values for such functions. Execution of such functions means returning the currently assigned results for the given arguments.

The graph is an extended notion of the state of the system. It is an ideal graph, containing connections between arguments and results for all functions to the extent that they are "known" (computable?) by the system, including the current values of assertable functions as well as the outcomes of (possibly future) computations.

(At a lower level of abstraction, some subset of the graph must be realized in real storage, to provide a memory of the current values of assertable functions. This subset might be considered to be "state". There is not necessarily a direct correspondence between assertable functions and stored data.)

The fundamental semantics of the system are defined in terms of the application of functions to objects, and the consequences thereof. Functions are executed to traverse and/or modify the graph. There is an execution capability in the system, which interprets things we will call "bodies". (These are also objects, i.e., nodes in the graph.)

Every function has an associated body, which is executed when the function is applied to arguments. Sometimes the body simply describes how to get from arguments to results. The edges in the graph aren't necessarily available to be traversed freely; it may require a computation to figure out what the result should be.

The execution of a body may cause changes in the graph, by adding or removing nodes or edges. This is information update, also called "side effects". (May want to add a notion of sending signals outside the system as another side effect. May want to dwell more on "apply" vs. "assign".)

In terms of the overall metaphor, specification of these function bodies is as important as the graph. The graph represents the state, or content, of the information system; the bodies represent its capability for action.

Later we will get more precise in these concepts, elaborating the consequences when a reference to a function is applied to references to arguments.

1.6.2 Linear Symbol Processing Systems

Some concepts have to be introduced because our information systems do not directly support the graph construct. Our next lower level of abstraction recognizes that we are dealing with a linear symbol processing system. While we want to have many sorts of objects at the nodes, we only have symbols in the system, and these are what have to serve as nodes. Furthermore, our systems are flattened into linear spaces. Instead of having magical edges which can all converge at a single node, we have to flatten these using references to the node.

What the system contains is symbols, which we will take to be finite linear strings over some alphabet. (Which alphabet doesn't really matter; for our purposes, we can also define "symbol" as the sort of object which can pass in and out of the information system.) To be even more precise: what the system contains is occurrences of symbols, which we will call tokens. Thus "10" and "10" are distinct tokens scribing the same symbol. (Later we can introduce a particular kind of equality predicate which is true if and only if its two argument tokens scribe the same symbol. It's a simple string comparison.)

For our purposes, we can say that all these symbols and tokens are themselves objects, in some abstract sense. Every time I write "10", I am producing a distinct token. Each of these tokens scribes the same symbol, which doesn't have any tangible existence, but it can be thought of as a distinct abstract object in its own right. That is, we can speak of the symbol which is scribed by the token "10" and also by the token "10". (The tokens are distinct objects, because I can say that there are two in the previous sentence, one occurring at the end and one occurring elsewhere. The corresponding symbol is itself a distinct object; later we will say that whatever this symbol is denoting is yet another object.)

Being most literal, we might say that the only objects that "really" exist in our system are tokens. Going up one level of abstraction, we can say that the symbols "scribed" by these tokens are also objects that "really exist". What shall we do about all the other sorts of objects we want to represent? We have no choice but to use symbols and tokens to represent them.

One consequence of all this is that we have to use symbols to represent objects.

1.7 Representing Objects

How do tokens and symbols represent objects? The problem is, there are too many such objects, by definition. If a symbol is itself an object, how do we know when the symbol is representing itself or some other object? To put it another way, suppose you want to use an arbitrary bit string as an object identifier. Will we have a conflict if we want to encode an integer or a character string or an image raster that happen to have the same bit string image?

We deal with this via the "data type" mechanism which, taken most broadly, means some convention by which a given symbol is understood to represent different things in different contexts. (Go on to implicit vs. explicit mechanisms, categories, prefixes, etc.)

For a different example: "Kennedy" and "Kennedy" are distinct tokens, scribing the same symbol. Without additional guidance, we can't tell whether the symbol is intended to denote a president, an airport, a battleship, or something else.

To be more precise (ho hum), we will say that a symbol can denote an object, while a token can represent a reference to (occurrence of) an object.

2 KEYNOTES

The essential concepts...

Behavior: an object can occur as the argument or result of a function.

Singularity: there is exactly one of an object.

Reference: we rarely actually deal directly with an object, though it is often convenient to think so. Once in a while, we need to remember that we are dealing with a token which scribes a symbol which denotes the object. The object itself may or may not actually "exist" in the system.

State: at minimum, the system needs a memory of the current values of assertable functions. This may or may not correspond directly to stored data; such stored data may or may not be partitionable into disjoint units belonging to distinct objects.

3 NOTES

To be incorporated into the main stream.

3.1 Reference

While we often say we are doing something to an object, or inquiring about one, we don't usually have the object itself "in hand" (whatever that means). Sometimes we have to acknowledge that we have only a reference to the object; the object itself may or may not exist in the information system.

Certain questions of identity (equality?) only make sense in this context. If you and I had our hands on the same book, does it make sense to ask "Is this the same book?"? The question only makes sense if we are pointing: are we pointing to the same book?

The notion of reference might have to be dragged through all these steps:

Evaluation of an expression yields a token (possibly involving resolution of versions and other things).
The intent of the token has to be understood, i.e., whether it is denoting a symbol or serving as a handle.
If a handle, then the prefix and suffix determine the object being denoted.

Functions are applied to references to objects. (More precisely, references to functions are applied to references to objects.)

Is it significant that the word "reference" occurs in the phrase "frame of reference"?

3.2 Singularity

One thing I want to believe about an object is that it is an object, i.e., there is one of it. If there are versions, or copies, then each is an object, and distinct from the abstraction (generic notion) of which they are versions or copies.

How many there are may depend on our frame of reference, our level of abstraction among addressing schemes.

3.3 Crossing Levels

We often get into trouble because we want to keep things simple but also cross certain levels of abstraction.

If we want to think of a document as a single thing, then we can't talk about managing copies of it. When it shows up on your desk, it is no longer on mine; if it's on your desk, then it's not in a cache, or a buffer, or a database. If we want to talk about those things, we are not talking about the same object.

Sometimes we want to say that a function returns an object, without regard for details of representation. But sometimes we recognize that it might return one identifier or another; sometimes we recognize that it returns an indirect reference to an object. We need to be consistent.

3.4 Tokens

We haven't said much about the nature of tokens, other than to suggest a prefixing mechanism. We haven't said anything about their size, or regularity.

Sometimes a token is itself really the object of interest. A chunk of text is a token, and so is a bitstring representing an image.

For aggregate objects (more precisely, intensional aggregates), we are deliberately (intentionally? pun!) vague about their form. A token used to represent a set, for instance, has to be such that the members of the set can be determined. Furthermore, two tokens denoting sets having the same members must be considered to denote the same set. This behavior could be realized by constructing the token for the set as a concatenation of tokens of the members, in some canonical sequence, but that is just one possible implementation.

Functions might choose to interpret an argument token in several ways: as a token in its own right, as the symbol scribed by that token, or as a "handle" denoting some other object (we expect this to be by far the most common case). The choice is inherent within the function, not encodable in the token. (But how is the function declared so that the system knows the difference? Sometimes the system may need to know.)

If a function uses a token as a handle, the function is depending on the system to insure that it is a valid handle, i.e., its category is determinable and the suffix is valid according to the rules for the category. There may also need to be a mechanism for extracting the suffix as a distinct token.

It is necessary to know the extent (size, content) of a token. It is generally necessary to be able to determine whether two tokens scribe the same symbol (string match). It is generally necessary to know (implicitly) whether a token is to be interpreted as a handle. It is necessary to know the category of a handle. It is necessary to know (implicitly) whether handles are singular, i.e., if two handles do not scribe the same symbol, might they still denote the same object? It is necessary to know whether two tokens denote the same object.

Tokens are never modified, but always replaced. Even if an operation appears to alter, say, the first bit of a token, our metaphor requires is to think that the whole token has been replaced by a different one.

We may or may not have to postulate that tokens are disjoint, at least in the sense that a change in one never implies an automatic change in another. (Will that work for aggregates?)

We might need to postulate at some level of detail that different functions (perhaps even different functions invocations?) always return different tokens. Thus, if f(x) and g(y) both have the value c, those results are embodied in distinct tokens both denoting c.

There is absolutely never any question about the singularity of a token. There is only one of it. What we might have are copies of symbols, but each copy is a distinct token. (Well, to be more precise, that depends on the notion of address space, which again involves different levels of abstraction. A token in a virtual memory space may correspond to multiple tokens in physical memory, in core and on disk. Let's say that a function operates with respect to a fixed address space.)

3.5 Function Invocations

Possible outcomes:

Non-termination.
Termination with:
- no results.
- one or more result objects, possibly null, possibly aggregates (which might be empty).
- error indication.
Side effects:
- None.
- Addition or deletion of nodes or edges in the graph.
- Signals going outside the system (distinguish from returned results).

3.6 Types

It's remarkable that we've managed to get this far without the notion of "type". In some respects it appears to be quite fundamental, yet it also seems ancillary to, and expressible in terms of, objects and functions.

There are several related notions, which sometimes get different names in different systems. One aspect has to do with classification, in the sense that objects are classified; objects "belong to", or "are instances of", various classifications. (Sometimes terminology also distinguishes between a thing which is a classification and a thing which contains the current population of that classification.)

The other aspect constrains the sorts of objects to which variables may be bound, including variables occurring as arguments and results of functions. From the declarations of such constraints, much can be inferred about the correctness of expressions in which such functions and variables occur.

The two aspects are bridged by expressing such constraints in terms of the classifications of objects, and by an assumption that variables will actually be bound only to objects having acceptable classifications.

Types play a more crucial role when attention is focused on preparation of applications for the system, in contrast to their actual use. Descriptions of what sorts of behaviors will be required and permitted in the system are naturally abstracted in terms of types rather than individual objects.

3.7 Rock Bottom

We should be able to find some sort of stable basis in the following.

There's a certain kind of stability which can be described in terms of passive functions, i.e., those having no side effects. This should account for many things, independent of certain levels of abstraction and other problems.

At any point in time a passive function has a fixed extension, consisting of a pairwise n:1 or 1:1 mapping from argument tokens to result tokens. Whenever the function is invoked with an argument token matching the left element of an entry in the mapping, the function will return a token matching the right element in that entry.

If an object appears to exhibit different behaviors in different circumstances, then the differences must be explainable in one or more of the following terms:

Different symbols denoting the object.
Different passive functions being applied to the object.
Updates to the passive functions.

We should assume there are no implicit parameters to functions. If an object is one of several arguments to a function, then different behaviors can also be explained in terms of differences in the other arguments.

The current frame of reference (context) might be an implicit parameter.

3.8 Rules

The graph as a notion of state or content of the information system is more relevant to "facts", less so to "rules", but the distinction is not sharp.

To the extent that function bodies are expressed in a form "known to the system" (whatever that means), they have some of the sense of being rules, i.e., meta-knowledge, i.e., knowledge about generic behavior as opposed to individual behavior. We are more likely to think of these as rules if they are expressed in a declarative form (whatever that means).

And yet, not all rules need be expressed in this way. Some are expressible in the graph form, showing correspondences among types and functions, in the spirit of entity-relationship models.

3.9 Symbol:Token = Set:Bag

There's more to the distinction between sets and bags than the question of duplicates. The whole concept that makes the notion of duplicates meaningful in the first place is the distinction between things and references to things. Sets collect things; no matter how many ways I refer to a thing in the set, there is only one of it in the set. Bags collect references to things (same notion as occurrences of things), with each reference having its own presence in the bag.

This is closely paralleled by the distinction between tokens and symbols. A collection of tokens is like a bag of symbols; each token represents an occurrence of a symbol.

3.10 Miscellany

Much of this goes back to old notions of ambiguity, and the automatic means we have for resolving much of it in context. D&R?

In trying to distinguish between symbols and tokens, I may have one copy of a symbol in virtual memory but several copies in physical memory.

3.11 Frames of Reference

It may be useful to try to formalize the notion of frame of reference (context) in our model. What goes into one? How is it established? Is there some span over which we must assume a constant frame of reference? Do we need a meta-frame for discussing frames of reference?

3.12 Function Bodies

The bodies of algorithmic (non-assertable) functions might be specified in several ways. One is to say they are externally defined, outside our sphere of interest (whatever that means), and all we know is how to initiate their execution and pass parameters back and forth. Another way is to include some language notions in our sphere of interest, exercising some control over how the bodies are specified and interpreted.

In the latter case, the whole territory of formal languages becomes relevant. The languages might be procedural or declarative; we might have many other kinds of options.

In general, the specifications of procedure bodies becomes part of the knowledge in the information system.

And we get involved in all the processes of validating the bodies and preparing them for execution, i.e., compilation. Among other things, this is where the whole business of type checking in compilers comes in.

3.13 Schemas

Mostly we've been exploring how objects behave. Most systems need a preparation phase before we can start doing things. We've touched on that in connection with the definition and compilation of functions.

The preparation phase generally consists of specifying the types of objects the system should be prepared to handle, specifying the functions that should be executable, specifying constraints, and making some implementation decisions. In this phase, types are more relevant than individual objects. Types play a pivotal role in defining the expected kinds of objects and in constraining the arguments and results of functions.

This preparation phase is, in itself, an information processing activity. We are doing things, modifying the information content of the system. Types and functions are themselves objects, and the connections among them are expressible via functions. In many cases preparing and doing are interleaved, or even indistinguishable.