William Kent

Database Technology Department

Hewlett-Packard Laboratories

Palo Alto, California

Aug 1992

> 1 INTRODUCTION . . . 2

> 2 WHAT TIME IS . . . 2

> 3 MEASUREMENTS ON A TIME LINE . . . 3

>> 3.1 Date . . . 4

>> 3.2 Time of Day . . . 5

> 4 COMPUTATION PROBLEMS . . . 5

>> 4.1 Time Zones and Daylight Saving Time . . . 5

>> 4.2 GMT . . . 6

>> 4.3 Irregular Units . . . 7

> 5 REPRESENTATION PROBLEMS . . . 7

>> 5.1 Mappings . . . 7

>> 5.2 Recognition Problems . . . 9

> 6 MODELLING TIME . . . 9

>> 6.1 Ambiguities . . . 9

>> 6.2 What "Is" a Date? . . . 10

>>> 6.2.1 Dates as Attributes . . . 10

>>> 6.2.2 Dates as Entities . . . 10

>>> 6.2.3 Dates as Relationships . . . 11

>> 6.3 Time in Data Bases . . . 11

>>> 6.3.1 Real World States and Data Base
States . . . 11

>>> 6.3.2 Sequentiality . . . 12

> 7 IDIOSYNCRASIES . . . 13

> 8 CONCLUSIONS . . . 13

> 9 REFERENCES . . . 13

Time, especially dates, constitutes one of the most complex "attributes" we deal with in recorded data. Its exploration can shed some light on general problems of measured quantities, attributes, values, representations, and even entities and relationships.

Our sense of "what time it is", or "what date", is an illusory absolute. It is in fact just as relativistic as our sense of place. Both senses give the illusion of absoluteness by virtue of certain arbitrary reference points. As we all know, there is no absolute coordinate system in the universe by which we can specify the location of objects. At best we can speak of some distances and other geometric relationships to other objects. We often forget this, although our sense of place is almost always relative to certain points on the Earth.

Quite similarly, the location of events in time must necessarily be relative. The date of an event is a measure of the time interval relative to some arbitrary reference event, that event in many cultures being the birth of Christ.

Our usual thoughtless notion of time refers to local time, rather than world time. We are likely to assign different birthdates to people born at the same moment on opposite sides of the international date line. In any model we have to decide how to deal with local time vs. world time, as with the assumption that events occurring on different dates necessarily occurred at different times.

The converse situation arises out of a question of resolution, or granularity. We tend to record the times of different kinds of events with different degrees of precision. Births and hirings are recorded to the nearest day; entertainment events and trains might be scheduled to the nearest minute; events of scientific interest might be recorded in millennia, or in nanoseconds. Something has to be worked out to accommodate these in a model with a single time-stamping mechanism. Shall we record all events to the maximum precision, or shall we allow the varying uncertainties we always allow in real life? Shall we continue to say that having the same birthday means being the same age, even though the births may have occurred almost 24 hours apart?

What should it mean to us to say that two events occurred at the same time? Local time, and variable granularities, make that question hard enough to deal with in our everyday world. The question becomes totally unmanageable for the likes of us if we consider relativity: modern physicists seem to be telling us that simultaneity is impossible to determine. In that context, we may begin to wonder what it means to record the time at which an event occurred. And if we really do want to account for relativity theory, then we had better accept that the interval between two events is subjective, being possibly different for different observers. The astronaut who returns to us ten years after launching may feel he's only been gone for a year.

Of course we all want to dismiss that as being too "far out" (pun?). But is it really going to be so long before we have to deal with such astronauts, or with on-board computers which return from space with different elapsed times from the ground-based systems? Will we have union arguments over whether that astronaut has earned one or ten years of salary and seniority for his one year of work? Will the astronaut object to being counted as nine years older, for retirement and insurance purposes?

It's not all so esoteric. The relevance keeps sneaking up on us. The relativistic question of synchronicity emerges in the problem of trying to synchronize events in distributed systems [5].

Time is inherently a difficult thing for us to think of by itself, or to describe directly. It is somehow a medium in which events occur. We perceive it as something which flows, with a sense of direction. It seems to fit closely the metaphor of a line (the time line), being an infinite and continuous one-dimensional thing, having a correspondence with the set of real numbers. The metaphor is augmented by a definite sense of direction: we have an indefinable intuitive sense of before and after, of past and future, which defines for us the universal convention for the direction in which time is increasing.

This is the best metaphor we can use for time in our work: a single directed infinite straight line. It may not be the ultimate metaphor. Relativity theory seems to posit multiple time lines. We will temporarily (note that temporal term!) ignore the difficulty of getting observers to agree as to where an event occurred on the time line (the synchronization problem of relativity theory, suddenly relevant to distributed systems). We should perhaps deal in subjective time lines, as perceived by people, hence a possibly different one for each person. But subjective time lines might not "measure" the same interval between two events. They might not even progress monotonically "forward", if we take into account memories of the past, thoughts of the future, and various dreams and hallucinations [cite Borges]. Also, time travel may some day materialize out of science fiction, confounding our illusion of monotonic progression. And there are cultures (e.g., Hopi) which don't share our perception of time, differing in a way that we find virtually indescribable [6].

We use date and time units to express a variety of phenomena. Sometimes we refer to one particular point in time, such as the date a certain person was hired, or the time at which someone was born. Sometimes we refer to one particular time interval, such as the year 1979, or between 7 and 10 PM on June 1, 1979.

But sometimes it's not clear whether we're thinking of a point or an interval. We can only be sure an interval is intended when two points are indicated, as in March 15-18, 1979. When only one point is indicated, as in a date, that could really be intended as a point with coarse granularity, or it could be a reference to a 24-hour interval.

Sometimes we refer to an interval of a certain length, not fixed in any particular place on the time line - a year, 3 hours, etc.

And sometimes we refer to recurring points and intervals. January and Monday are such. So are partial dates, like the dates of holidays (Dec. 25th). So is time of day when no particular date is expressed or implied, and so are the times in train and plane schedules.

Thus we have points and intervals of time, sometimes fixed, sometimes floating, and sometimes recurring on the time line.

The intervals are measured in various granularities, or precisions. Date and time correspond to the precisions occurring most frequently in recorded data: time to the nearest day, and time to the nearest minute or second (or fraction thereof). Some other useful precisions include decades, centuries, and millennia.

A date such as March 15, 1980 designates a point on the time line - a rather broad point, having a width of one day. The date identifies the point by giving its distance from an arbitrary origin, which is normally considered to be Dec. 31 of the year 1 BC. (There are a number of calendar systems in use in the world today. Unless otherwise indicated, dates will be described relative to the Gregorian calendar [3].)

A date expresses a "distance" along the time line in months, days, and years, just as an ordinary length might be expressed in yards, feet, and inches. That is, dates are simply linear measurements in a mixed radix (multiple units) system. March 15, 1980 could be re-phrased as a "distance" of 1980 years, 3 months, and 15 days from the origin. The months of the year are sub-units of a length, like the inches of a foot. The months happen to have names, while the inches do not.

There are some anomalies which render the analogy less than perfect. A coefficient in an ordinary linear measurement counts the number of whole units between the origin and the measured point, while for a date the coefficient also counts the unit which contains the point. A point occurring within the first foot of a line segment is counted as a distance of 0 feet, x inches, but a day occurring within the first month of a year is counted as month 1 (January), x days. Thus, to be more precise, the date March 15, 1980 actually represents a distance of 1979 years, 2 months, and 14 (and a fraction) days from the origin. There is no year 0; January 1 of the year 1 marks a point 0 years, 0 months, and 0 (and a fraction) days from the origin.

Another anomaly is the irregularity of the units. While the inches in a foot are all alike, the months of a year are not. There is not a constant multiplier for converting between months and days. So, the conversion between months and days is not a simple multiplication or division, but something like a table lookup or similarly complex algorithm. The analogy between inches in a foot and months in a year is more apparent if we use addition instead of multiplication: to convert x feet to inches, we sum the first x elements in the list:

12, 12, 12, 12, 12, 12, 12, 12, ....

while to convert x months (of a year) into days we sum the first x elements in the list:

31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31.

For leap years, 29 replaces 28 in the sequence. For arbitrary years, the conversion is ambiguous, although the sequence with 28 is conventionally the default.

Another anomaly has to do with negative values. With distances, all units of a measurement are counted in the same direction. For a point to the left of an origin, the yards, feet, and inches are all counted from right to left. Not so with dates. For negative, i.e. BC, dates, only the years are actually counted backwards. The months and days are counted forward in the same direction as AD dates. The date March 15, 10 BC corresponds to -10 years +3 months +15 days from the origin (or maybe that should be -9 years +2 months +14 days).

Time of day is substantially the same phenomenon as date, being an interval along the time line, relative to some origin. The granularity is finer, and the origin concept is a bit different. In fact, there are two starting-point conventions: for a 24-hour clock, we measure time of day from the preceding midnight; for a 12-hour clock, we measure time of day from the preceding midnight or noon.

But, fortuitously, the anomalies associated with date are gone. Time of day is measured in a nice regular mixed radix number system, with no more or less anomalies than the measurements of distance. (Except that for the 12-hour clock we tend to use 12:xx instead of 0:xx in the first hour.)

It is most helpful to separate the concept of what a date means from the problems of its representations. The semantics of a date are simply an integer number of days from the origin day. Meaningful operations on dates are the same as the meaningful operations on distances from a fixed point on a line. A date can be increased or decreased by an integral number of days, and the difference between two dates is an integral number of days.

Similar operations can be defined for time of day and time intervals.

It is not the same time all over the world. An event on the time line maps into different date-time measurements in different places. In effect, each time zone measures time relative to a different origin point on the time line.

The measurement of time is therefore dependent on the place where it is measured. This becomes increasingly significant, of course, in the context of distributed systems and data bases.

Furthermore, clocks are set back or forward by one hour twice a year in some parts of the world, in observance of Daylight Saving Time (DST) (sometimes also called Summer Time). This also has the effect of shifting the origin point on the time line forward and backward by one hour, at various times of the year.

In a strange way, the measurement of time depends on the time at which it is measured. Some segments of the time line are measured relative to one origin, and some to another. To map an event into a date-time measurement, you not only have to know where on earth you are, but where you are on the time line. Time is a function of place and time:

T=f(P,T).

This dependence of time measurement on time and place affects the determination of:

- the time of an event,
- the date of an event,
- the interval between two events,

and causes a number of anomalies.

The date of an event can be affected by time zone and DST. For one party in a phone call it could be Monday while for the other party it is Tuesday. That could occur even within the same time zone around midnight, between regions which do and don't observe DST. In these cases, if both parties are maintaining a log of phone calls, they will be recorded as occurring on different dates.

Date and time are ambiguous references to the time line, unless supplemented with some knowledge of place. This knowledge of place determines both the time zone and whether or not DST might be in effect. Quite often we assume the "local place" as an implicit qualifier.

Given the date and time of two events, the computation of the interval between them can get complicated. Without taking time zones and DST into account, we might have a message arriving before it is sent. We might have events which actually occurred up to 25 hours apart being recorded with the same date. The elapsed time from midnight to midnight at a given location might be 23, 24, or 25 hours. (Does that constitute one day?) The interval between 2 and 3 AM on certain Sunday mornings might be zero hours, or two. (Are there two occurrences of 3 AM on some of those mornings? Can events occurring an hour apart occur at 3 AM?)

Other minor computational problems have already been mentioned. Dates are not quite as well behaved as other mixed radix measurements because the sizes of the units are irregular. Months have different numbers of days, years have different numbers of days, and one notorious month has different numbers of days in different years.

We have problems with the representation of date and time, though many of the problems are common to the representations of any measured quantities.

The mapping of a measurable item to a character string involves choices in the following variables:

- Accuracy and precision
- Origin
- Units
- Coefficients
- Representation of units
- Representation of coefficients
- Sequence and punctuation

Accuracy and precision of measurement is something we will largely ignore here. We have enough difficulty with the representation of a single ideal measurement.

Choice of origin doesn't present a problem, when the boundaries of a measured item are defined. But occasionally they are not. In order to identify the location of some physical point, a reference point or coordinate system must be established. The measurement of distance to a city depends on the reference point chosen in the city. For date and time, the origin point is affected by the calendar system, the time zone, daylight saving time, and the choice between 12- and 24-hour clocks. Different calendar systems, such as the Jewish or Chinese, are based on origin points different from the Gregorian calendar in common use. Sometimes the origin is arbitrarily shifted: we lost ten days in the reform from the Julian to the Gregorian calendars, when Oct. 5, 1582 became Oct. 15, 1582.

For most measurements there are alternative sets of units available, such as metric and non-metric. For time (in the narrow sense), there seems to be only one set of units in general use: hours, minutes, and seconds. For dates (time in the broad sense), one set of units dominates: years, months, and days of the Gregorian calendar. But again, there are alternative calendars, with different lengths of months and years, sometimes based on lunar instead of solar cycles [3]. And we also have Julian notation, often used in data processing and quite unrelated to the Julian calendar, where the set of units is just years and days (analogous to measuring distances in just yards and inches).

Even for a given set of units and an idealized measurement, various sets of coefficients can be applied to the units. There is a "canonical form"

k1U1, k2U2, ..., knUn

in which:

- for all terms except the first, kiUi is less than the next higher unit (i.e., don't write 13 inches),
- for all terms except the last, the coefficient is an integer,
- the coefficients ki all have the same sign,
- the coefficient ki corresponds to the number of whole units Ui between the measured point and the origin (possibly modulo some divisor).

Such canonical conventions are frequently violated. We often do write 15 inches instead of 1 foot, 3 inches; and 1.5 feet instead of 1 foot, 6 inches (or 4.5 feet instead of 1 yard, 1 foot, 6 inches).

As mentioned earlier, for dates BC, the years' coefficient has the opposite sign from the months' and days' coefficients. And for dates in general, the coefficients are all too large by 1.

For time of day on a 12-hour clock, the coefficient k1 is always wrong: within the first hour after the origin, we write 12 instead of 0. And for dates we sometimes truncate the two or three high order digits of the years coefficient, writing 79 or 9 for 1979.

Representation of units is not usually of concern for stored data, since the units themselves are not stored but factored into the description: distance in miles, weight in tons, age in years. For time of day and date, even input and output forms don't seem to involve representation of units. But for time intervals, and other measured quantities, units can be represented in a variety of abbreviated or capitalized forms, and sometimes alternative symbols are used (such as ' for feet, or " for minutes).

For the representation of coefficients, we have all the usual options concerning the representation of numeric quantities, such as number base, data type, precision, implied decimal points, leading zeroes, etc. For dates, the month coefficient might be an integer, or any assorted spellings, abbreviations, and capitalizations of the names of the months. A year might be given in Roman numerals, and clock faces sometimes have Roman numerals.

The problems of variability of sequence and punctuation seem to be peculiar to date and time of day. For dates, the year, month, and date coefficients can occur in just about any permutation. Occasionally a sign indication is included (AD or BC), sometimes at the beginning and sometimes at the end. Time of day always seems to be given in standard sequence, although it could have an origin indication (AM or PM) added. Time intervals, like most measurements, suffer the standard problem of omission of 0 coefficients: 2 hours and 10 seconds.

Punctuation conventions for time of day include various combinations of colons and periods (12:23:10, 12.23.10, 12:23.10 - the last having an ambiguous reference to either seconds or hundredths of a minute). For dates, punctuation conventions include various combinations of slashes, commas, periods, and blanks.

The mapping of measured item to character string is not uniquely invertible. 3:10 might be hours and minutes or minutes and seconds. 3/10 might be March tenth or October third (or even 0.3). Some time and date conventions use periods, allowing 3.10 to have all four of those interpretations, in addition to being an ordinary decimal number.

Such problems are not unique to time and date. They are no different in principle from the problem of determining whether a sequence of digits is decimal or integer, and whether it might be in octal or hex. Similar solutions apply in all cases: impose arbitrary conventions to make them decidable.

The question of how time ought to be treated in information models must first deal with a sea of ambiguities. The terms involved have an unusually large variety of meanings and usages. A single term may correspond to a half dozen distinct concepts.

In this section we merely illustrate some of the ambiguities. We have made no attempt to disambiguate the terminology in the present paper; to do so would require the introduction of too many artificial terms and definitions to be intelligible. We hope that the context has made the meanings sufficiently clear.

Time:

- A broad inclusive term, in the sense that date is a particular form of representing time. "A given date represents the time elapsed since the birth of Christ."
- Time of day, an aspect of time which is different from date, and is measured in a granularity of minutes, seconds, or fractions of seconds. "Date and time of birth."
- A time interval. "The flight time is 3 hours."

Day:

- One of: Sunday, Monday, ..., or Saturday. "On what day were you born?"
- Relative day within a month. "Month, day, and year."
- Relative day within a year. "YYDDD is year and day in Julian notation."
- A single specific day. "On the day I was born, it rained."
- Any 24-hour period. "72 hours equals three days."
- A period from midnight to midnight. "Tom and Dick were born on the same day." (Such a day could occasionally be 23 or 25 hours, due to DST.)
- An interval between sunrise and sunset. "Night and day."

Sunday:

- One specific day.
- The first day of any week. (Were two people born on Sunday born on the same day?)

Similar ambiguities apply to other intervals, e.g., week, month, year, January.

For our purposes, we define an *attribute* as the name of some fact (such as
"height" or "birthday"), and a *value* as a character string
which can be associated with an attribute for some individual.

Values can be treated in two ways regarding equality. As pure character strings, unequal strings constitute unequal values. Thus "6 feet", "6 ft.", and "72 inches" are three unequal values of the height attribute; and "Jan. 1, 1979", "1 Jan 79", and "1/1/79" are three unequal values of the birthday attribute.

Whenever we allow for conversion among values of attributes, we implicitly acknowledge something invariant behind the different character strings so equated. There is something else there besides the characters in "6 feet". There is a certain space, to which various representations of measurement can be associated. We can, if we wish, think of the thing being measured as an entity in itself. Thus a certain height, or a certain day, can be viewed as an entity.

The correspondence between such an entity and its various representations can be modelled in several ways. The simplest is to consider it an arbitrary mapping between entities and symbols, just like the mapping of people to their names or social security numbers, or the mapping of a certain color to the arbitrary synonyms "red" and "crimson".

Alternatively, one could model the whole measurement phenomenon [4]. Various sets of units of measurement are introduced into the model, and the combination of a measured entity and a set of units is mapped into a set of numbers (the abstractions of the coefficients mentioned earlier). These numbers and the set of units are then further mapped into character strings taking into account the desired representations of the coefficients and units.

This last approach is generally too complex to be useful, although it is probably the most precise model of the phenomenon.

Treating measured quantities as entities is not a generally familiar concept - except, perhaps, in the domain of time. Certain cycles and patterns in our lives have become so culturally ingrained that they seem almost palpably to be entities. Days, weeks, months, and years seem to have that character. It does not seem so unnatural to think of a certain day as an entity, having various attributes and being related to various other entities. It had various rainfalls and temperatures in various places. There are people who were born, hired, fired, married, and divorced on that day. Wars were declared, battles fought, and treaties signed on that day. And so on.

Of course, there is a curious lack of objectivity about such entities. Different calendar systems carve out different entities (years, months) from the time line. A day is nothing more than the period of one revolution of a certain spinning object; the individual revolutions of other objects do not seem particularly to be entities. It doesn't matter. Entities are where we perceive them. They exist wherever we agree to think so [4].

We can contemplate the "naming" of intervals.

A year is an interval subdivided into 12 shorter intervals, the months. A week is an interval subdivided into 7 shorter intervals, the days. Why have we given distinct names to those shorter intervals?

Because a date like March 15, 1979 is often recorded in three columns of data, it is sometimes held to be a relationship among three entities. It can be awkward to define what those entities are.

Though often claimed, it is rarely true that a database is a snapshot of a state of the real world. Many databases include a great deal of historical information.

There are two sequences of states: the real world and the database. A single state of the database describes many states of the real world. That's precisely the significance of memory: information about many past states of the real world can be retrieved from a single present state of the data base. A database update creates a new database state, which may not correspond simply with a new (later) real world state. The new database state might reflect

- A deletion of an old real world state (purge expired information).
- An extension to include a new real world state (report of change occurring in the real world).
- A modification to some prior state of the real world (either an error correction, or a late report of an old change).

Earlier states of a data base are never altered. Earlier states of the recorded real world are often altered, in later data base states.

"As of" data retrievals are ambiguous: "as of" in the real world, or "as of" in the data base? Do you want your bank balance as of March 15, 1979, or as it appeared in our records on that date? While the former may be the more sensible interpretation, it could give different answers when asked at different times.

There are at least two sets of time values: when things occur, and when they are recorded. Things are usually recorded later than their occurrence. And they are not always recorded in the same sequence as they occurred.

One of the open questions in information modelling is whether time should be introduced as a special construct, or whether it can be handled simply as another attribute defined and managed by users as needed. Because of the computational complexities mentioned earlier, it is a convenience to provide special facilities for time and date. On the other hand, there is little the system can do to automatically maintain time related information, unless users are willing to equate time of occurrence with time of recording. If time of occurrence has to be provided explicitly as part of the input information, then it seems to have the same character as other attributes included in the information.

Note that if dependences are automatically maintained in the data base (especially implications), then we must be concerned about recording events in the chronological order of occurrence. Otherwise there could be an enormous maintenance problem. If an event is recorded as having occurred earlier than some other events already recorded, then it may be necessary to "roll back" time, and recompute history. In the worst cases, previously generated information (or other output actions) may become invalidated, and perhaps previously accepted inputs may be rendered unacceptable (inconsistent with the newly perceived state of affairs at that time).

Time also seems to be rich in phenomena illustrating how illusory and artificial are some of our perceptions of reality. There are so many things we perceive as "natural" and "existing" which, if examined very objectively, turn out to be surprisingly elaborate artificial conventions of our making.

The concept of time collects around it an amazing variety of mental exercises illustrating all sorts of perversities of which the human mind is capable. It is the vehicle by which many subtle mental abilities are learnt, and for which we invent many kinds of abstractions.

Time provides a general lesson in irregularities. Learning about time does wonderful things to the naive mind. They suddenly come into focus when you try to teach these things to a child. It seems to be a basic mechanism for bringing the mind in touch with many peculiar devices we unthinkingly use in dealing with the real world.

Reinterpretation of signs: "9" signifies 45, or 15, or a quarter.

The third hand is the second hand.

Abstraction of concepts: an angle between radial pointers is "the same as" a digital display.

Further abstraction: some clocks don't have numbers on them.

Metaphor: hands pointing.

First lesson in unnatural arithmetic: 10 + 4 = 2.

Ten can be before (earlier than) two.

International date line: is it the same day everywhere? The same time?

If you and I were born on Monday, were we born on the same day?

- B. Breutmann, E. Falkenberg, and R. Mauer, "CSL: A Language for Defining Conceptual
Schemas", in G. Bracchi and G.M. Nijssen,
*Data Base Architecture*, North Holland, 1979. - J.A. Bubenko, "The Temporal Dimension in Information Modelling", in G.M.
Nijssen,
*Architecture and Models in Data Base Management Systems*, North Holland, 1977. - "Calendars",
*Encyclopedia Brittanica*. - W. Kent,
*Data and Reality*, North Holland, 1978. - L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", Comm. ACM 7 (21), July 1978.
- Benjamin Lee Whorf,
*Language, Thought, and Reality*, MIT, 1956.