The Question of Meaning
In my previous installment, I considered the question of what relations and relational databases "really were." Just to remind you, if R is a relation, then the heading of R denotes a certain predicate, and the body of R denotes a set of propositions. The propositions are "instantiations" of the predicate, and they're assumed by convention to be ones that evaluate to true. Thus, a database can be thought as a collection of true propositions.
Given the foregoing as background, now I want to go on to consider the question of meaning -- that is, the question of what relations and relational databases really mean. I'll begin by observing that, of course, every relation -- be it a base relation, or a view, or a snapshot, or a final result, or an intermediate result, or any other kind of relation -- certainly does have a meaning, and users need to know those meanings in order to use the database effectively and efficiently. In the case of the suppliers relation from the suppliers and parts database, for example, the meaning is something like this:
The supplier with the specified supplier number (that's the S# value) has the specified name (that's the SNAME value) and status (that's the STATUS value) and is located in the specified city (that's the CITY value). Moreover, the status value is in the range 0 to 100 inclusive, and must be 20 if the city is London. Also, no two suppliers have the same supplier number. |
(Actually this statement is neither precise nor complete, but it's good enough for the purposes of the present discussion.)
Observe now that the foregoing meaning can be regarded, once again, as a predicate. And, of course, at any given time, relation S -- more precisely, relvar S -- contains exactly the rows that make that predicate true (loosely speaking).
Before going any further, I should say that I've deliberately slurred over the difference between relation values and relation variables in the foregoing, somewhat. What's more, I'm going to continue doing the same thing for the remainder of this installment. This isn't meant to be a very formal presentation! As Bertrand Russell once memorably remarked: "Writing can be either readable or precise, but not at the same time."
Anyway, the reason this question of meaning is so important is that, of course, the meaning of a given relation -- that is, the predicate for that relation -- is the criterion for acceptability of updates on that relation (or relvar, rather). Obviously, an update shouldn't be permitted if it violates the corresponding predicate.
In an ideal world, therefore, the DBMS would know the meaning of every relation, so that it could deal correctly with all possible updates. But, of course, that's impossible. There's no way it can know those meanings exactly. For example, there's no way the DBMS can know what it means for a certain supplier to be "in" a certain city or to "have" a certain status; these concepts are outside the system -- they're understood by users, but not by the DBMS. More precisely, they're part of what logicians call the interpretation (of the relation in question).
What's more, suppose the user tries to insert a new supplier row, say the row (S6,Smith,50,Rome); in other words, the user is effectively asserting that it's a "true fact" that now there's a new supplier S6, with name Smith and status 50 and located in Rome. Then, again, there's no way the DBMS can possibly know whether that "true fact" is really true or not. All it can do is make sure the new row doesn't violate any of the known and applicable integrity constraints. If it doesn't, then the system will accept the new row and enter it into the database, and will treat it as a "true fact" from that point forward.
So the DBMS does not, and cannot, know the suppliers predicate 100 percent. But (as already indicated) it does know a good approximation! -- it knows the integrity constraints. So we define the formal "meaning" of any given relation (sorry, relvar) to be the logical AND of all constraints that apply to the relation in question. (I'll come back and explain what I mean by that qualifier "formal" in just a moment.) For example, the formal meaning of the suppliers relation might look something like this:
FORALL x# IN S#, xn IN NAME, xt IN INTEGER, xc IN CHAR,
y#
IN S#, yn IN NAME, yt IN INTEGER, yc IN CHAR
( IF
{ S# x#, SNAME xn, STATUS xt, CITY xc } IN S AND
{ S# y#, SNAME yn, STATUS yt, CITY yc } IN S
THEN
( xt
≥0 AND xt
£100 ) AND
( IF xc = 'London' THEN xt = 20 ) AND
( IF x# = y# THEN xn = yn AND xt = yt AND xc = yc ) )
This expression is the logical AND of the constraints I showed for suppliers a few installments back -- the status is in the range 0 to 100 inclusive, supplier numbers are unique, and so on. (By the way, there's really no need for you to struggle through and try to understand this formal expression in every last detail.) And then we call this expression the relation predicate (or, more accurately, the relvar predicate) for suppliers.
In general, then, the relation predicate for a given relation represents the formal, internal, system-understood "meaning" of the relation in question -- as opposed to the informal, external, user-understood meaning (what I called the "interpretation" above). The formal meaning can be thought of as an approximation to the informal meaning, but the whole point is that it is formal, and it's "understood" by the system. And, of course, it's that formal meaning that the system checks when updates are attempted. Indeed, as you'll recall from a previous installment, it's checked immediately when updates are attempted; to say it another way, relation predicates are satisfied at statement boundaries. This fact is so important that we call it The Golden Rule:
The Golden Rule No update statement is ever allowed to leave any relation |
Now, I've been making a tacit assumption in the discussion so far that the relations we're talking about are all base relations specifically. But, of course, it isn't just base relations that are subject to update, as we know. In particular, views are updatable too, and so The Golden Rule applies to them as well. Thus, in order to decide whether an attempted update on a view is acceptable, the DBMS needs to know what the relation predicate is for that view. So what is the relation predicate for a view? Clearly, we need some inference rules -- rules, that is, such that, if the system knows the predicates for the base relations, it can infer the predicates for the views. (Of course, the system does know the predicates for the base relations -- they're essentially the integrity constraints that were stated explicitly for those base relations when those base relations were defined.)
It turns out that the required inference rules are very easy to state: They follow directly from the definitions of the applicable relational operators. For example:
- If view C is defined as the union A UNION B of base relations A and B, then the predicate PC for C is just (PA) OR (PB), where PA and PB are the predicates for A and B, respectively. In other words, a row appears in C if and only if it appears in A or B or both; so it must satisfy PA or PB or both. The implications for updates on C are obvious.
- Likewise, the predicate for the intersection A INTERSECT B is (PA) AND (PB); the predicate for the difference A MINUS B is (PA) AND NOT (PB); and so on.
Incidentally, I'd like to point out as an aside that it's this business of predicates that's the key to the whole question of view updating. I mentioned in an earlier installment that views are always updatable (barring integrity constraint violations); well, it was the business of predicates that I had in mind when I said that, as I hope you can now see.
As another aside, I think it's also worth pointing out that the overall database has a predicate too, and hence a formal meaning too. As we know, a database is a named container for relations; and those relations have formal meanings, as we've seen. So the database predicate is the logical AND of all applicable constraints -- which is to say, the logical AND of all relation predicates and all database constraints that apply to that database, loosely speaking. And the database predicate too is satisfied at statement boundaries (that is, The Golden Rule applies to the overall database as well as to the individual relations in that database).
One last point: Of course, the term "database" as I'm using it here denotes a purely logical concept; it's the database as perceived by the user. I'm certainly not talking about any kind of physical construct at the physical storage level. In the extreme, one "database," in my sense of the term, might map to any number of physically stored databases, managed by any number of different DBMSs, running on any number of different machines, supported by any number of different operating systems, and connected together by any number of different communication networks.
# # #
About our Contributor:
Online Interactive Training Series
In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.