Predicates and Duplicates

William Kent
Database Technology Department
Hewlett-Packard Laboratories
Palo Alto, California

Nov 1987

> 1 INTRODUCTION
> 2 BASIS OF THE PROBLEM
> 3 CONSEQUENCES

1 INTRODUCTION

We examine some problems with predicates and duplicates:

Predicates might return multiple occurrences of True.
The extension of a predicate might contain duplicates.
Truth-valued derived functions might behave this way.
Truth-valued foreign functions might behave this way.
We might have to allow stored predicates (tables) to contain duplicates.

Such behaviors are anomalous for logical predicates. In logic, p(c) is either true -- once -- or it is not true, and the set of things satisfying p(x) is truly a set. Unfortunately, truth-valued functions in Iris can exhibit such anomalous behavior.

In the following discussions, I will not consider our special form of truth-valued query that has no result variables. I am dealing only with the behavior of Boolean as an ordinary type.

2 BASIS OF THE PROBLEM

Assumption: Boolean is a first-class object type.

Assumption: a predicate is a function whose result type is Boolean, e.g.,

assigned: emp[1,1] x dept[0,n] -> Boolean.

It should follow that

find b/Boolean where b=assigned(Dick,Sales)

is a legal query. Is it? Let's assume so.

[This assumption is supported by two arguments. The first is that this is a normal kind of query for getting a function to return values according to its result type.

The second argument is that a natural way to write this query is

find assigned(Dick,Sales).

Using the standard rewriting rule

find f(x) :-> find y where y=f(x),

in which the type of y is the result type of f, we get the above query.]

Then

find b/Boolean where forsome e/emp b=assigned(e,Sales)

is also a legal query. It returns True if the Sales department has any employees. In fact, it can return multiple occurrences of True, one for each employee in the Sales department. (Exercise for the reader.)

[Note: We run into other problems I don't want to deal with here if Boolean contains the two instances True and False. For now I want to assume that Boolean just contains the single instance True. The above query returns an empty scan (bag?) if Sales has no employees.]

If that's a legal query, then

has-emps: dept -> Boolean

has-emps(d) ::= find b/Boolean where forsome e/emp b=assigned(e,d)

is a legal function definition. Is has-emps a predicate? Let's assume so, since it satisfies our current definition.

For future reference, we will need the "phantom" function

has-emps*: dept -> Boolean x integer

such that the result of has-emps*(d) is

{ <True, 1>

...

<True, k>

}

if d has k employees.

Now consider the query

find b/Boolean where b=has-emps(Sales).

It returns as many Trues as there are employees in the Sales department.

This can be demonstrated by our rewriting rules, according to which

find b/Boolean where b=has-emps(Sales)

:->

find b/Boolean where forsome z/integer <b,z> = has-emps*(Sales)

which returns the duplicates.

Doesn't this mean that has-emps is not behaving like a logical predicate? It doesn't simply return True if the Sales department has any employees. It behaves as though it had the participations

has-emps: dept[0,n] -> Boolean,

which is a bit anomalous for a predicate.

Next, let's look for duplicates in the extension. The query

find d/dept where has-emps(d)

returns k occurrences of each department, where k is the number of its employees.

Let's try to demonstrate this with our rewriting rules:

find d/dept where has-emps(d)

:->

?????????

Oops, we don't at the moment have a rewriting rule to cover this case. However, I assume that the following queries should behave the same:

find d/dept where has-emps(d)

find d/dept where has-emps(d)=True

We can then introduce the new rewriting rule for a predicate p:

p(x) :-> True=p(x).

Resuming our analysis,

find d/dept where has-emps(d)

:->

find d/dept where has-emps(d)=True

:->

find d/dept where forsome z/integer <True,z> = has-emps*(d).

We have demonstrated that has-emps appears to have duplicates in its extension.

3 CONSEQUENCES

It appears that has-emps is a predicate which can return duplicate Boolean values, since the result of

find b/Boolean where b=has-emps(Sales)

may be

{True, True, ...}.

It also appears that has-emps is a predicate which has duplicates in its extension, since the result of

find d/dept where has-emps(d)

may be

{Sales, Sales, ...}.

I don't think we gain anything by simply asserting that this kind of truth-valued function is not a predicate.

Once we are forced to recognize predicates with this kind of behavior, we just get into deeper and deeper trouble. If we can have derived predicates that behave like this, on what grounds can we exclude foreign functions or even base predicates that behave like this?

If we have to admit base predicates which allow duplicates in their extensions, we have to alter our fundamental data maintenance strategy. Presumably we should allow both kinds of predicates, allowing or disallowing duplicates, perhaps indicated by their participations.

Currently, if p(c) is true, then any further assertion of p(c) is ignored. For predicates which admit duplicates, we would now have to add an extra copy of c into the extension of p. Under this new interpretation, the result of

Set p(c)=True

is that there is exactly one occurrence of c in the extension of p, regardless of how many occurrences there were before.

Add now behaves differently:

Add p(c)=True

adds one more occurrence of c to the extension of p if p allows duplicates; it is a uop error otherwise.

Correspondingly,

Remove p(c)=True

removes one occurrence of c from the extension of p, and we would have to use

RemoveAll p(c)=True

to remove all occurrences.

We have to be sure that operations of the form

Set/Add/Remove p(c)=q(...)

are well-defined.

And we have to insure that Set, Add, and Remove are well-defined when applied to functions which are derived from such predicates.

This whole chain of reasoning seems inexorable, unless we can falsify any of the assumptions I have made along the way. Can we?

[May also want to mention the problem that arises if False is a Boolean object and predicates are multivalued: how do we prevent

p(c) returns {True, False, ...} ?]