Integrity Constraints (Part 3)

C.J. Date Author, Read Author Bio || Read All Articles by C.J. Date

This is the last installment of my miniseries on the topic of integrity constraints. I looked at type constraints in the first installment and column and table constraints in the second; now I want to conclude by considering database constraints.

Like a table constraint, a database constraint can be as complicated as you like, provided only that it refers to at least two distinct tables (it must be at least two, because otherwise the constraint isn't a database constraint after all but a table constraint).

Here's an example, based (like my examples last time) on the suppliers and parts database:

IF S.STATUS < 20 AND S.S# = SP.S# THEN SP.QTY £QTY(500)

("suppliers with status less than 20 must not supply any part in a quantity greater than 500"; by the way, observe the QTY literal, QTY(500)).

Note that this constraint certainly does refer to two tables, S and SP (note in particular the "join term" that interrelates them). Note too that the constraint can be regarded as shorthand for a more formal version that looks something like this:

FORALL s# IN S#, sn IN NAME, st IN INTEGER, sc IN CHAR, p# IN P#, q IN INTEGER ( IF { S# s#, SNAME sn, STATUS st, CITY sc } IN S AND { S# s#, P# p#, QTY q } IN SP AND st < 20 THEN q£QTY(500) )

As I noted last time, however, I'm not going to worry too much about such more formal expressions in this series.

A couple of points arise from this example:

First, observe that we distinguish between single- and multi-table constraints, not between single- and multi-row constraints.
For example, the superkey constraint shown last time for table S is certainly a single-table constraint, even though it talks about two rows (SX and SY). In the past, by contrast, taxonomies for integrity constraints have typically drawn a dividing line between single- and multi-row constraints However, that distinction, it seems to me, is one that's more important from an implementation point of view than it is from a logical one.
Second, database constraints, like column and table constraints, are checked immediately -- another (and major) point of difference between the taxonomy under discussion and others in the past.
In previous schemes, multi-table constraints, at least (and possibly others too), were checked at COMMIT time (that is, at end-of-transaction) instead of immediately. In particular, the SQL standard includes something it calls "DEFERRED checking" (though in SQL the checking is done at COMMIT time at the latest -- it might actually be done before end-of-transaction -- but that's a detail we can ignore here). Deferred checking leads to major problems, however, of both a theoretical and a practical nature; in The Third Manifesto, therefore, we take the position that all checking must be immediate.^[1]

Note: It's true that certain constraints (especially database ones) involve keeping distinct data items "in synch," as it were, and hence that certain data items sometimes need to be updated at the same time, in effect. The Third Manifesto provides a parallel form of assignment in order to take care of this requirement. I might say more about that operation (which is of fundamental importance) in some future installment of this regular column.

Here are a couple more examples of database constraints:

EXISTS S ( S.S# = SP.S# )

This example defines the foreign key constraint from shipments to suppliers ("for every shipment SP, there must exist a supplier S with the same supplier number"; note the use of the existential quantifier, EXISTS). Note that the constraint is indeed a database constraint, because it spans two tables.

EXISTS P ( P.P# = SP.P# )

This example is similar to the previous one.

Certain pragmatically important special cases arise in connection with database constraints. For example, consider the notions of kernel, characteristic, and associative entities, which are intuitively useful concepts in connection with logical database design. (The terms are taken from Codd's extended relational model RM/T.^[2] See also my own tutorial description.^[3] Briefly:

A kernel entity is one that has independent existence (kernels are "what the database is really all about" -- they're the entry points into the database, in effect). In the suppliers and parts database, suppliers and parts are kernels.
A characteristic entity is one that further describes or "characterizes" some other entity (and is existence-dependent on that other entity). There aren't any characteristics in the suppliers and parts database, but order line items might provide an example in a customers and orders database (order line items are certainly existence-dependent on the relevant order).
An associative entity is a many-to-many (or many-to-many-to-many ...) relationship that serves to link other entities together, logically speaking. (Relationships are entities, of course!) Shipments provide an example in the suppliers and parts database.

Kernel entity types are represented by base tables (typically; sometimes they're represented by views instead, but I don't want to get into details of that possibility right now). The same is true of characteristic and associative entity types as well, of course, but the tables corresponding to such entity types have foreign keys to represent the characteristic or associative relationship. So it is indeed the case that database constraints are relevant here.

The other pragmatically important case I want to mention in connection with database constraints is "HAS A" (not "IS A"!) relationships. For example, a part-time employee "has a" hourly wage instead of a monthly salary. So, first, there'll be a table for employees in general, with common information such as employee number, department number, and so forth; second, there'll be a table for part-time employees specifically, giving information specific to part-time employees (for example, hourly wage); and third, there'll be a foreign key constraint -- a database constraint, in fact -- from the part-time employees table to the "employees in general" table.

Note: There's a great deal of confusion in the industry over IS A vs. HAS A. IS A is subtyping and inheritance, and I talked about it briefly a couple of installments back ("a circle IS A ellipse"). HAS A is quite different! -- in particular, it isn't subtyping, and it isn't inheritance. See the book by Hugh Darwen and myself.^[4] If you want to investigate this distinction in depth; here I just want to make it clear that both IS A and HAS A relationships can be handled straightforwardly within the constraint taxonomy under discussion.

I'll close this miniseries with some further examples of constraints for the suppliers and parts database and showing how they would look under the scheme I've been describing. I'll use Tutorial D (again, see a couple of installments back) to state the constraint formally in each case. Here goes:

Legal supplier numbers are of the form Snnnn (nnnn = up to four decimal digits).

TYPE S# POSSREP { X CHAR WHERE SUBSTR ( X, 1, 1 ) = 'S' AND IS_INTEGER ( SUBSTR ( X, 2 ) ) AND CHAR_LENGTH ( X )£ 5 } ;

This first one is a domain or type constraint. I've assumed the availability of certain operators for use in expressing that constraint: SUBSTR, for extracting a substring of a given string; IS_INTEGER, for testing whether a given string denotes an integer; and CHAR_LENGTH, for obtaining the length of a given string.

Red parts weigh less than 50 pounds.

IF P.COLOR = COLOR ( 'Red' ) THEN P.WEIGHT < WEIGHT ( 50 )

This one is a table constraint -- actually a "single-row" constraint. Note the COLOR and WEIGHT literals.

Suppliers S1 and S4 are always in the same city.

IF SX.S# = S# ( 'S1' ) AND SY.S# = S# ( 'S4' ) THEN SX.CITY = SY.CITY

This example is also a table constraint, but a "multi-row" one.

No shipment quantity can be greater than twice the average of all such quantities.

SPX.QTY £ 2 * AVG ( SPY, QTY )

Another multi-row table constraint, this one involving aggregates. The expression AVG(SPY,QTY) returns the average of the values currently appearing in the QTY column of the SP table.

Every London supplier supplies part P2.

IF S.CITY = 'London' THEN EXISTS ( SP WHERE SP.S# = S.S# AND SP.P# = P# ( 'P2' ) )

This one is a database constraint, involving exactly two tables, S and SP.

Suppliers in Athens can move only to London or Paris.

IF S'.CITY = 'Athens' THEN S.CITY = 'Athens' OR S.CITY = 'London' OR S.CITY = 'Paris'

This one needs a little more explanation. First of all, note that it's a transition constraint (a table transition constraint, to be precise); all prior examples have been examples of state constraints instead. The primed table name S' refers to the value of table S before the update, the unprimed name S refers to the value after the update. The constraint can thus be read as follows: "If the supplier city was Athens before the update, then after the update it must be London or Paris" (unless it hasn't changed at all, in which case it's still Athens, of course).

Supplier cities and part cities are disjoint.

FORALL S ( NOT EXISTS P ( P.CITY = S.CITY ) )

In very stilted English: "For all suppliers, there doesn't exist a part with the same city" -- another database constraint.

Average shipment quantities never decrease.

No solution provided.

This is a transition constraint involving aggregates (though only one table, so it's a table constraint, not a database one): "The average quantity of any given part, taken over all suppliers, can never decrease." This constraint is quite complicated! No solution is provided here, because a proper formulation would require rather more apparatus than I've had room to describe in this short series; I'll have to ask you to take it on trust that the constraint can indeed be expressed in Tutorial D, as required.

References

[1] C. J. Date and Hugh Darwen, Foundation for Future Database Systems: The Third Manifesto (2nd edition), Addison-Wesley, 2000.

[2] E. F. Codd, "Extending the Database Relational Model to Capture More Meaning," ACM Transactions on Database Systems 4, No. 4, December 1979.

[3] C. J. Date: " The Extended Relational Model RM/T," Relational Database Writings 1991-1994 (Addison-Wesley, 1995).

[4] C. J. Date and Hugh Darwen, op.cit.

# # #

Standard citation for this article:

C.J. Date, "Integrity Constraints (Part 3)" Business Rules Journal, Vol. 2, No. 4, (Apr. 2001)
URL: http://www.brcommunity.com/a2001/b062.html

About our Contributor:

C.J. Date Author,

C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database technology. He is best known for his book An Introduction to Database Systems (eighth edition, Addison-Wesley, 2004), which has sold some 725,000 copies and is used by several hundred colleges and universities worldwide. He is also the author of many other books on database management.

Read All Articles by C.J. Date