Fact-orientation before Object-orientation (Part 2): Capturing Constraints with Object Role Modeling
This is the second article of a two-part series on using fact-oriented analysis as a precursor to object-oriented design. The focus is on data modeling, using ORM (Object Role Modeling) for the fact-oriented approach and UML (Unified Modeling Language) for the object-oriented approach. Overviews of ORM can be found in  and , and a detailed treatment in . The UML specification is accessible at , with detailed discussions in  and .
In Part 1, I argued that data use cases should be used to seed the data model. In case you missed that article, its main ideas are summarized in the next few paragraphs.
A data use
case is an example of data that is to be input to or output from the information
system (e.g., a form or report). The domain expert verbalizes facts from the
example, using natural language. The modeler reformulates these as elementary
facts, then abstracts these to fact types, expressed in a natural but formal
Room at Time is booked for Activity). The
populations of these fact types are constrained by business rules, which the
modeler validates with the domain expert by
the rules, preferably in ORM's textual language (e.g.,
it is impossible that the same
Room at the same Time is booked for more than one Activity), and
- populating the fact types with sample populations that satisfy the rules as well as with counter-examples that illustrate violations of the rules.
ORM differs from ER and UML in expressing all facts as sentences, where objects play roles (parts in relationships). ORM's graphical syntax depicts a sentence type as a logical predicate or sequence of one or more roles (shown as boxes), each connected to its object type (shown as a named ellipse). Fact types may be populated with tables of facts where each role corresponds to a column, facilitating validation by population. In contrast, attribute-based approaches such as ER and UML are problematic for verbalizing and multiply instantiating fact types other than simple binary relationships, and their use of attributes introduces null values.
In Part 1, only two kinds of rule were discussed: s imple mandatory and internal uniqueness constraints. A role is mandatory if it must be played by each instance in the population of its object type, and is depicted in ORM as a solid dot. A set of one or more roles in a predicate is unique if each entry for those role(s) in the corresponding fact table must be unique in the table's population. This constraint is depicted in ORM by an arrow-tipped bar over the role(s). For binary associations, a simple mandatory constraint is captured in UML using a minimum multiplicity constraint of 1 or more. For associations with n roles (n > 1), a uniqueness constraint over n or n -1 roles is captured in UML using a maximum multiplicity constraint of 1.
The remainder of this article illustrates some of the ways in which ORM provides a constraint notation that is simpler, more stable, more orthogonal, and yet more expressive than UML's for data modeling purposes. When you add these advantages to ORM's natural support of validation by verbalization and population, it makes a lot of sense to use ORM for conceptual analysis before designing a class diagram in UML. Since I did my doctorate many years ago formalizing ORM, you may suspect that I'm biased in this recommendation. Read on and decide for yourself.
There is no space in this short article to cover the wide range of rules that are supported graphically in ORM but not UML. However the following examples will illustrate the general idea, and the references provide a detailed coverage. Suppose that Table 1 is an example of a report that our information system is expected to output. You might like to try modeling this data use case yourself before reading on.
Table 1. A sample output report about Movies
|Backdraft||Ron Howard||US||Fred Bloggs||US|
|Crocodile Dundee||Peter Faiman||AU||Ann Green||US|
|Star Peace||Ann Green||US||?||?|
One way of
modeling this in UML is shown in Figure 1. Here movies are identified by a movie
number. Assuming people can be identified simply by their name, Movie and Person
classes may be used as shown. Since UML currently has no standard notation for
primary identification, I've used my own notation "
for this. Although the population of the sample report suggests that movie
titles are unique, and that a person may direct at most one movie, let's assume
that the domain expert confirms that this is not the case. As shown later, ORM
facilitates the task of determining significant populations.
Figure 1. UML class diagram for Table 1
Instead of naming the associations between Movie and Person, the rolenames "director" and "reviewer" are used here to distinguish the two roles played by Person. Association names may be used as well as, or instead of, role names if desired. The multiplicity constraints indicate that each movie has exactly one director but may have many reviewers, and that a person may direct or review many movies.
But there is still a missing business rule. Can you spot it?
Figure 2 indicates how the same domain might be modeled in ORM. In addition to the schema, each fact type has been populated with the original facts. This population illustrates the mandatory role constraints (e.g., each movie has a director, but movie 3 has no reviewer). The original facts support the uniqueness constraint patterns shown on the review (m:n) and birth (n:1) fact types. These constraints are verbalized and approved by the domain expert.
However the original facts suggest that the title and director fact types are both 1:1. Unlike Figure 2, this would mean each role in these two associations has its own uniqueness constraint. These four uniqueness constraints are verbalized as:
Movie has at most one MovieTitle;
Movie was directed by at most one Person;
MovieTitle applies to at most one Movie;
Person directed at most one Movie.
The domain expert is likely to question the last two constraints. To confirm that these are not really constraints, we add counterexamples to violate them. These are shown in italics below the fact tables for the title and director associations. Here movies 3 and 4 have the same title and the same director. The domain expert agrees that this is possible, and the new facts are added to the sample population for inclusion in the business rule documentation.
Figure 2. Populated ORM schema for Table 1 with some counterexample checks
The director and review associations are compatible (same object types), so it is meaningful to compare their populations. The ORM analysis procedure requires us to check whether a set-comparison constraint (subset, equality, or exclusion) must apply between them. This leads to the detection of an exclusion constraint, captured graphically here by the circled "X" constraint between the role-pairs making up the directed and reviewed associations. This verbalizes as:
Person directed and reviewed the same Movie.
Or reading it the other way:
Movie was directed by and was reviewed by the same Person.
This rule is verbalized and double-checked with the domain expert by providing a counter-example, shown in italics below the review table in Figure 2. The pair (1, Ron Howard) now appears in both the director and the review tables. Is it possible for Movie 1 to be directed by Ron Howard and also reviewed by Ron Howard? The domain expert denies this possibility, and the rule is added to the model.
Some domain experts are happy to work with diagrams and some are not. Some are good at understanding rules in natural language and some are not. But all domain experts are good at working with concrete examples. Although it is not necessary for the domain expert to see the diagram, being able to instantiate any role directly on the diagram makes it easy for you as a modeler to think clearly about the rules.
UML has no graphic symbol for an exclusion constraint, making it harder to think of the rule in analysis. If detected however, the rule can be documented in a note attached to the relevant associations, as shown in Figure 3. It can also be captured formally in the Object Constraint Language (OCL).
However in spite of its good intentions OCL is too mathematical for validation with typical domain experts. In contrast, an ORM tool such as Microsoft Visio Enterprise allows the rule to be verbalized and populated for validation, and also generates the appropriate DDL code to enforce the constraint when the conceptual schema is mapped to a database system.
Figure 3. UML diagram with exclusion constraint documented as a note
include one graphic constraint that partially captures the notion of exclusion.
Its exclusive-or constraint is depicted by writing "
beside a dashed line connected to the relevant associations. Although the
current wording of the UML specification describes the
constraint as applying to a set of associations, it should instead have defined
the constraint over a set of roles or association-ends to avoid ambiguity in
cases with multiple common classes.
Visually this could be shown by attaching the dashed line near the relevant ends of the associations, as shown in Figure 4(a). The corresponding ORM diagram is shown in Figure 4(b), using a "lifebuoy" symbol for the constraint. In ORM, the constraint verbalizes as:
Vehicle is leased from a Company or was purchased from a Company, but not both.
Figure 4a. An exclusive-or constraint in UML
Figure 4b. An exclusive-or constraint in ORM
lifebuoy symbol suggests, an
xor constraint in ORM is a combination
of a mandatory constraint (solid dot) and an exclusion constraint (circled
"X"). In ORM, a mandatory constraint can apply to one or more roles
played by the same object type.
When two or
more roles are involved, this is called a disjunctive mandatory role
constraint or inclusive-or constraint (each instance of the object type
must play at least one of these roles). An exclusion constraint can apply to a
set of compatible role sequences. If there is only one role in each sequence, we
have the simple case shown here. With the earlier movie example, there were two
roles in each sequence. ORM constraints are legal wherever it makes sense to
apply them, and they can be combined orthogonally (as in the
If the population in the earlier Figure 2 is significant, then each person referenced in the database must either direct a movie or review a movie. This inclusive-or constraint is depicted in ORM using a circled solid dot connected to the roles that are disjunctively mandatory (see Figure 5).
As an example of an exclusion constraint between simple roles, suppose that if a movie has its production canceled it is never sent out for review. This is depicted in Figure 5 by the exclusion constraint between the unary predicate "was cancelled" and the first role of Movie was reviewed by Person. This constraint verbalizes as:
Movie was canceled and was reviewed by some Person.
Figure 5. Each person directed or reviewed; no movie was canceled and reviewed
UML does not support an inclusive-or constraint or an exclusion constraint. Moreover, it does not support unary fact types, so if we want to express the fact that a movie was canceled we need to binarize the unary (e.g., with a status association) or use a Boolean attribute or subtype. This is unnecessarily removed from the viewpoint of the domain expert, who simply wants to say a movie was canceled and see it that way in the model.
ORM includes ring, join, and other constraints that often exist in the real
world being modeled. As a trivial example of a ring constraint, the association
was based on Movie is irreflexive:
Movie was based on itself.
As an example
of join subset constraint, suppose we record the title (e.g., "Mr",
"Mrs", "Dr") and sex of each person. Then there is an
optional functional association from Title to Sex (
Title determines Sex).
For example, "Mr" applies only to males, while "Dr" applies
to both males and females. So we should enforce the join subset constraint:
Person has a Title that determines Sex then that Person is of that Sex.
In addition to greater expressibility, ORM models are more stable because they make no use of attributes. For example, suppose we started with the UML model in Figure 3, and then found out that we needed to record something about countries (e.g., their name or population). We are now forced to recast the birthCountry attribute as the association Person was born in Country.
In ORM, the original fact types remain unaltered, and we just add new ones. Although good UML modelers may often use associations instead of attributes for such cases, in principle any attribute could be affected in this way (e.g., consider the problem of formulating the join subset constraint above if we depicted title and sex as attributes). The greater semantic stability of ORM models also carries over into its query language. ,
If you're a UML devotee, you probably feel I've been very unkind to UML. In fact I'm not trying to say "don't use UML." I think it's great for logical and physical design of object-oriented applications. I just don't think it's good for conceptual analysis of data requirements.
What I'm suggesting is that if you are going to work with class diagrams in UML, do yourself a favor and derive them from ORM models used in the analysis phase. That way you can get the benefits of fact-orientation getting a complete, stable, and correct data model that is easily validated with the domain experts, and still use UML for designing your object-oriented programs. The same goes for using ORM as a front-end to ER or other attribute-based modeling approaches for database design.
Further discussion on how ORM relates to UML can be found in , , and . If you disagree with anything I've said, please feel free to email me at TerryHa@microsoft.com, but please include some argument to justify your viewpoint.
2. Bloesch, A. & Halpin, T. 1997, 'Conceptual queries using ConQuer-II,' Proc. 16th Int. Conf. on Conceptual Modeling ER'97 (Los Angeles), D. Embley, R. Goldstein eds, Springer LNCS 1331 (Nov.) 113-126. *
4. Halpin. T. 1995, 'Subtyping: conceptual and logical issues,' Database Newsletter, ed. R.G. Ross, Database Research Group Inc., vol. 23, no. 6, pp. 3-9.*
12. OMG UML Revision Task Force, OMG Unified Modeling Language Specification, version 1.3, available online from http://uml.shl.com/artifacts.htm.
* accessible online at www.orm.net