Integrity Constraints (Part 1)

Summary: Integrity Constraints (Part 1)

C.J. Date Author, Read Author Bio || Read All Articles by C.J. Date

This is the first of a short series of installments of this regular column on the important topic of integrity constraints. An integrity constraint (constraint for short) is, in essence, a conditional expression that must evaluate to true. Here's an example from a customers and orders database:

TOTAL_OWED � CREDIT_LIMIT

If an update is attempted that would make this expression false, the update is rejected, and an error message is sent back to the user.

It should be clear that integrity constraints are crucially important, since they control the correctness of the data. In many ways, in fact, integrity constraints are the most important part of the system. (In particular, they're much more important than performance! -- it doesn't matter at all how fast your queries run if you can't trust the answers you get to be correct. I suppose I should admit that not everybody agrees with me on this question, but I stand by my position, and I'm prepared to defend it if necessary.)

Now, I'm sure you've realized already that what the database world calls (and has been calling for many, many years) integrity constraints is largely just business rules by another name. For reasons of my own, however, I'm going to stay with the term integrity constraints for the time being. What I want to do in this series of installments is present a taxonomy or classification scheme for integrity constraints.

Now, over the years, many such taxonomies have been proposed, by many different people, but I'm going to stick my neck out here and suggest that -- at last -- we have one that's correct. And you probably won't be surprised to learn that the scheme I'm going to describe is the one Hugh Darwen and I advocate in The Third Manifesto ^[1] .... The reason we think it's correct is because the structure of the scheme mirrors the structure of the data itself. For consider: The database is made out of tables; tables are made out of columns; and columns are made out of domains. So we have database constraints, at the overall database level; table constraints, at the individual table level; column constraints, at the individual column level; and domain constraints, at the individual domain level. To be more specific:

Domain constraints specify the legal values for a given domain. Note: Actually, it's one of the fundamental tenets of The Third Manifesto that what the database community calls a domain is nothing more nor less than what the programming languages community calls a data type (type for short). For a variety of reasons, in fact, we really prefer the term type; however, we do use domain as well, treating the two terms as interchangeable, and I'll do the same in this series. Thus, domain constraints are often referred to as type constraints in what follows.

By the way, it's important to understand that the types or domains in question can be either system-defined types (typically rather simple ones -- for example, integers, character strings) or user-defined types, of arbitrary complexity (for example, geometric points, ellipses, polygons). We're certainly not limited to simple things like integers and strings.

Column constraints specify the legal values for a given column (or attribute, to use the more formal term -- but I'll stick to column).
Table constraints specify the legal values for a given table (more precisely, a given relvar -- recall that a "relvar" is a relation variable, and that's essentially what a database table is). Note: Table constraints can also be transition constraints. A transition constraint specifies legal transitions from one value to another -- for example, "salaries can never decrease" (don't we wish). For simplicity, however, I'll ignore transition constraints in this series, except for a couple of examples near the end of the last installment.
Database constraints specify the legal values for a given database. Again they might be transition constraints, in general, but, to repeat, I'm mostly going to ignore transition constraints until further notice.

This taxonomy is summarized in Table 1.

Table 1. A taxonomy for integrity constraints
Domain (or type) constraints: Specify legal values for a given domain (type) Column (or attribute) constraints: Specify legal values for a given column (attribute) Table (or relvar) constraints: Specify legal values for a given table (relvar) Database constraints: Specify legal values for a given database (Transition constraints ignored for simplicity)

Let's look at some examples. I'll deal with type constraints in the present installment and other kinds of constraints in the next one. Note: The examples that follow are deliberately not expressed in SQL; rather, they're expressed in a relational language called Tutorial D, which is a language Hugh Darwen and I use as a vehicle for explaining the concepts of The Third Manifesto. I intend to use that same language as a basis for examples in my regular column from this point forward, explaining whatever aspects of the language seem to need such explanation as I go.

Here then are some examples of type (or domain) constraints, all of them involving user-defined types:

TYPE QTY

	   POSSREP { Q INTEGER WHERE Q > 0 AND Q < 5001 } ;

This first example defines a type or domain called QTY ("quantity"), with a possible representation ("POSSREP") consisting of an integer called Q. What this means is that it must at least be possible to represent quantities by means of integers; it doesn't mean that quantities are actually represented as integers -- they might be, or they might be represented as floating point numbers, or character strings, or anything else you can think of. Note: This POSSREP notion is one that pervades The Third Manifesto. It turns out to be very important, for all kinds of reasons; however, those reasons aren't really of concern to us here, so I won't bother to go into details right now.

Anyway, you can see that we've used the POSSREP in the example to specify a type constraint (see the WHERE clause): Quantities must be such that they can be represented by, precisely, the integers 1, 2, ..., 5000 -- meaning (loosely) that legal quantities are just those five thousand values.

As this first example suggests, type constraints in general are essentially just a specification of the set of values that make up the type in question; in effect, they're just an enumeration of that set of legal values. Here's an example in which the constraint is actually expressed as such an enumeration:

TYPE COLOR

    POSSREP { C CHAR WHERE

        C IN { 'Red', 'Yellow', 'Blue', 'Green' } } ;
Legal colors are red, yellow, blue, and green.

Here's another example of a type constraint:

TYPE ELLIPSE

    POSSREP { A LENGTH, B LENGTH, CTR POINT WHERE A > B } ;

Ellipses can possibly be represented by their major semiaxis a, their minor semiaxis b, and their center ctr. The semiaxes are lengths and the center is a point (LENGTH and POINT are further user-defined types, whose definitions aren't shown). The ELLIPSE type constraint specifies that the major semiaxis a must be greater than or equal to the minor semiaxis b.

While we're on the subject of type constraints, by the way, I should mention that "IS A" constraints are an extremely important special case. For example:

TYPE CIRCLE IS ELLIPSE WHERE

    THE_A ( ELLIPSE ) = THE_B ( ELLIPSE )

    POSSREP { R   = THE_A   ( ELLIPSE ) ,

              CTR = THE_CTR ( ELLIPSE ) } ;

The specification CIRCLE IS ELLIPSE is part of the type constraint. It means that every circle "IS A" ellipse, and type CIRCLE is a subtype of supertype ELLIPSE. Type CIRCLE thus "inherits" various properties from type ELLIPSE -- for example, the property of having an area (circles have an area, because all ellipses have an area). However, circles also have certain properties of their own that ellipses in general don't have (for example, circles have a radius, but ellipses in general don't). Incidentally, note the additional specification (also part of the type constraint):

THE_A ( ELLIPSE ) = THE_B ( ELLIPSE )

This specification indicates that a circle is precisely an ellipse for which the semiaxes a and b are equal. (The operators THE_A and THE_B should be self-explanatory.)

Finally, note that circles have a "POSSREP" consisting of a radius R and a center CTR, defined to be equal to the major semiaxis and the center, respectively, of the particular ellipse that the circle in question happens to be.

(As an aside, I'd like to say that The Third Manifesto includes extensive support for the whole business of subtyping and inheritance in general, including support for both single and multiple inheritance and support for scalar, row, and table type inheritance. The details are, obviously, far beyond the scope of this column; all I wanted to do here was point out the relevance of type constraints to that support. The book gives full details, of course.)

(To be continued.)

References

1. C. J. Date and Hugh Darwen, Foundation for Future Database Systems: The Third Manifesto (2nd edition), Addison-Wesley, 2000.

Standard citation for this article:

C.J. Date, "Integrity Constraints (Part 1)" Business Rules Journal, Vol. 1, No. 12, (Dec. 2000)
URL: http://www.brcommunity.com/a2000/b046.html

About our Contributor:

C.J. Date Author,

C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database technology. He is best known for his book An Introduction to Database Systems (eighth edition, Addison-Wesley, 2004), which has sold some 725,000 copies and is used by several hundred colleges and universities worldwide. He is also the author of many other books on database management.

Read All Articles by C.J. Date