Nulls Again

William Kent
Database Technology Department
Hewlett-Packard Laboratories
Palo Alto, California

Nov 1992

 

Let's consider common-sense semantics.

Consider poor Sam: homeless, unemployed, and born on the Fourth of July.

Address(Sam) is empty, vacuous, non-existent. So is Employer(Sam). Is it true that Address(Sam)=Employer(Sam)? No. But it's also not true that Address(Sam)¬=Employer(Sam). In order to say that two things are equal, or unequal, we have to have two things to compare. Address(Sam) and Employer(Sam) aren't anything; there isn't anything there for us to compare, for us to say they are equal or unequal. In fact, it isn't even true that Address(Sam)=Address(Sam); we don't have any things to compare.

If we construct the set {Address(Sam)}, we get the empty set: {Address(Sam)}={ }. The same is true of {Employer(Sam)}. Furthermore, if we construct the set

{Address(Sam), Employer(Sam)},

we still get a set containing nothing, i.e., the empty set. So,

{Address(Sam), Employer(Sam)} = {Address(Sam)} = {Employer(Sam)} = { }.

Is it really true that {Address(Sam)}={Employer(Sam)}? Yes. The empty set is a thing, which can be compared with other things. If x and y both refer to the empty set, then x=y is true. So, we also have {Address(Sam)}={Address(Sam)} being true.

If we construct the tuple <Name(x), Address(x), Employer(x), Birthdate(x)> for Sam, we get

<Sam, , , 7/4/74>.

That's a funny thing. The date 7/4/74 is the fourth thing in the tuple, though the tuple contains only two things. The second and third things aren't there, though we hold a place for them.

Is it true that <Sam, , ,7/4/74> = <Sam, , ,7/4/74>? Of course it is. Like sets, these tuples are existing things which can be compared. Two aggregates have to be equal if they have the same structure, they are empty in corresponding places, and things in corresponding non-empty places are equal.

We do need some sort of placeholder in stored data. But the placeholder is not an address or an employer. If we construct a table of data about people, we might have something like

-----------------------------------------
| Name | Address | Employer | Birthdate |
|======|=========|==========|===========|
| Sam  |         |          |  7/4/74   |
-----------------------------------------

Like our tuple above, the row for Sam has four cells, though two of them are empty. The row contains two things; the date 7/4/74 is in the fourth cell. How long is this row? Four, even though it contains only two things.

The blank spaces are not addresses or employers. If we allow such blank spaces, then it is simply untrue to say that everything in the second column is an address, or that everything in the third column is an employer. The second column is allowed to contain addresses or blank spaces, and the third column may contain employers or blank spaces.

By the way, things would be the same whether Sam is homeless or we just don't know his address. Things would also be the same whether Sam is unemployed or we just don't know his employer. Our data only reflects what we know, not the total truth. Address(Sam) being empty means precisely that we don't know an address for Sam; we can't tell which case it means. The same is true for Employer(Sam) being empty.