William Kent

Database Technology Department

Hewlett-Packard Laboratories

Palo Alto, California

1990

> 1 THE CORE OF THE PROBLEM . . . 1

> 2 NULL AND VOID . . . 2

> 3 THE NULL OBJECT . . . 2

> 4 A COHERENT MODEL . . . 3

> 5 THE ANOMALY . . . 3

> 6 IS NULL NULL? . . . 4

> 7 IS THE EMPTY SET NULL? IS IT A SET? . . . 5

> 8 CONCLUSIONS . . . 5

One of the surprisingly intractable issues in database models has been the treatment of the concept of null.

The core of the problem arises from the paradox of trying to make nothing out of something. There truly is nothing between the pair of asterisks

**

But what's between the following pair?

*nothing*

It may be "nothing", but it's something, not nothing. It's like trying to say that the word "nothing" is not anything, does not exist. Trying to have something be nothing is bound to lead to a contradiction somewhere. If we put a band-aid on it in one place, it just pops up in another.

We can only try to minimize the problem, by minimizing the number and severity of anomalies that do arise.

To have a point of reference, let's describe nothing. Rather, let's describe something which is the condition of nothingness, which we will call "void".

Void is not an object or value. It does not make sense to write the comparison

f(x)=void.

There simply isn't any object or value to compare with. The best we can do is to describe voidness as a condition Void(e). That predicate applied to an expression (which might just be a variable) returns True if evaluating the expression would return nothing.

If Void(x) is true, then the set {x} truly contains nothing, i.e.,

{x} = {}.

A list constructed as (x,c) has c as its first and only element.

Even tuples are affected. A tuple constructed as <x,c> also has c as its first and only element. And there's the rub...

Yes, there's the rub. The relational model necessarily deals in homogeneous sets of tuples, with each tuple in a relation (including the results of a query) having the same number of elements. What can we do for a tuple of the form

<person, birthday>

if we don't know someone's birthday? We need something to stand for nothing, and that's why the null object (value) was invented. If we don't know Sam's birthday, then his tuple has the form

<Sam, null>.

How many things are in that tuple? I think two. What do you think? We could, I suppose, say that there is only one thing in that tuple, though its length is two. Is that sort of double-think helpful? Is it useful to say that the length of a tuple is not the number of things it contains, that null is no thing?

Whatever you may think about that, there is one coherent treatment of null which treats it as a proper existing first-class object.

In an object model, the primary meaning of equality is "same object". Thus the expression

f(x)=f(y)

is true if and only if f(x) and f(y) return the same object... even if it is the null object. (If you have an objection, hold it until the next section.)

Note that if f(x) and f(y) were void, the comparison wouldn't work; there are no results to compare. We might, however, be motivated to define the = operator so that it returned False rather than an error if an operand was void.

What does null mean? That's an age-old religious question, with many mystical and ritual answers [references?]. The simplest one, and the one on which we can build a coherent semantic model, is "absence of information". Unfortunately, even this simple idea gets overblown when carried beyond the relational model.

The idea has a simple meaning in the relational model, which essentially deals in atomic objects. While tuples and sets do get mentioned, they are not "main stream" values. They do not occur as values in relations. Thus absence of information only needs to be interpreted as absence of atomic-valued information.

A lot of grief comes from taking this notion too literally in a model where sets and other aggregates are also first-class objects. We do need the null object to be returned by an atomic-valued function which "has no value". But we don't need that for set-valued (or bag-valued) functions: we already have the empty set {}.

Let's not say that null and the empty set are the same object. The null object simply is not a set, while the empty set certainly is. (If you want to debate that, there's a section for you later on.)

Object identity leads to one anomaly when applied to the null object. Attributes with unknown values are treated as being equal. For example, a query about married couples having the same birthday would include couples whose birthdays are unknown.

The simplest way to get around it is to explicitly exclude null values in the query:

SELECT p, Spouse(p) FOR EACH p Person

WHERE Birthday(p)=Birthday(Spouse(p)) and Birthday(p)¬=null;

Some people think that's inelegant and complicated. People will forget to write that clause, and get unwelcome surprises in their query results.

Well, that may be true, but it's the best we can do. Churchill said something to that effect about democracy, calling it the worst form of government except for any other he could think of [reference?].

Almost all the grief about nulls comes from trying to make it "easier" (!!) to avoid that anomaly.

A common solution is to assert

null ¬= null,

that is, the null object is not equal to the null object. Then f(x)=f(y) is false when both are null, and we won't think that people with unknown birthdays have the same birthday.

(It just gets more complicated if we prefer to say that the expression null=null returns a null value rather than False.)

What's the cost of this expedient solution?

To begin with, we have to re-explain our fundamental notion of object equality. Now = means "same object" - almost always. Life is life and truth is truth and blue is blue and parts is parts but null is not null.

This one exception means that the = operator is overloaded, having a different behavior for one object than for all other objects.

Well, two objects, actually. By the same line of reasoning, we should not allow {}={}. Otherwise, if we ask for people who have the same hobbies, we will include couples having no known hobbies. (And we get into philosophical debates about why having no hobbies is different from having no birthday.)

Then = doesn't behave the same for all sets - unless we want to claim that {} is not a set (we will talk about that, as well as {}¬={}, later).

Well, Br'er Rabbit, we ain't seen the end of this tar baby. One sticky thing sticks to another. What about sets or bags containing null objects? Two sets are equal (the same set) if and only if they contain the same members. Doesn't that imply an equality comparison? Can we have {null}={null} without null=null?

Some people want to get around this by claiming there's no such thing as {null}, since null is nothing and a set containing nothing is the empty set: {null}={}. The same applies to bags. So, if you construct a bag as [f(x),f(y)], you might wind up with a bag containing 0, 1, or 2 members. This is another case of trying to make null behave like void.

That also leads to anomalies. For the same department, the bag of salaries and the bag of bonuses might contain different quantities of elements, leading to peculiarly skewed averages or other sorts of distortions. It seems presumptuous to assume that null results should always be removed, as opposed to giving the user an opportunity to include or exclude them as he sees fit. We could give him a Squeeze operator that will squeeze out the nulls if he wants to. Of course, he will sometimes forget, and get surprising results - it's still the best we can do.

You can tell what's coming next, of course. Tuples. We can't automatically squeeze nulls out of tuples and still preserve their length. We have to have two elements in <f(x),f(y)>, even if the values are null. What should we do about equality? One more time: two tuples are equal if and only if corresponding elements are equal. So, a tuple containing a null can't be equal to any other tuple.

And it gets worse for complex aggregates. Imagine two bags of tuples of tuples and bags and sets which match, element for element, right down to the last speck - including null objects. They still aren't equal to each other.

And there's more. Besides explicit comparisons, equality semantics are sneakily buried in other places:

- Distinct: what if Distinct eliminates one of two things which are not equal to each other?
- Set construction: what definition of equality determines whether things are duplicates to be eliminated?
- The In and Occurs operations. Can "x In b" ever be true if x contains null anywhere?

There might be other traps lurking out there.

Some proposals want to equate null with the empty set.

That seems difficult to sustain, since null rather obviously is not a set, while the empty set is.

Or is it? It has also been proposed that the empty set {} not be considered a set, since it is "nothing".

If that were the case, then sets would not be closed under difference and intersection, since {} is sometimes a result of those operations.

Furthermore, if the empty set is nothing, then the set containing it would still be empty, i.e., {{}}={}, being the same object, which is not a set. Then, according to Peano, we wouldn't have any numbers.

Well then, if the empty set is a set, can we still equate it with null? Frankly, I don't know. It sounds bizarre to think of the null object as a set. The reader is invited to explore the consistency of such an assumption, and to let me know what anomalies might be unearthed.

For example... suppose we said that null={}, but null¬=null. It must then follow that {}¬={}. What does that do to set theory?

It would be ironic for the relational model to equate the null object with the empty set, since the relational model prides itself on having a sound foundation in the formalisms of set theory and logic.

We've really only addressed one technical issue: null=null. We have described a simple consistent model having one explainable and fixable anomaly, as compared with a whole host of glitches and exceptions that have to be explained on top of a not-so-coherent model.

There are other technical issues we haven't addressed, such as null in logic.

Dismissing this as a tempest in a teapot, much ado about nothing, would be a mistake. There are substantial issues and precedents at stake. It is something.