Fact-orientation before Object-orientation (Part 2): Capturing Constraints with Object Role Modeling

Terry   Halpin
Terry Halpin Professor of Computer Science, INTI International University (Malaysia) Read Author Bio || Read All Articles by Terry Halpin

This is the second article of a two-part series on using fact-oriented analysis as a precursor to object-oriented design. The focus is on data modeling, using ORM (Object Role Modeling) for the fact-oriented approach and UML (Unified Modeling Language) for the object-oriented approach. Overviews of ORM can be found in [6] and [7], and a detailed treatment in [3]. The UML specification is accessible at [12], with detailed discussions in [1] and [13].

In Part 1, I argued that data use cases should be used to seed the data model.[9] In case you missed that article, its main ideas are summarized in the next few paragraphs.

A data use case is an example of data that is to be input to or output from the information system (e.g., a form or report). The domain expert verbalizes facts from the example, using natural language. The modeler reformulates these as elementary facts, then abstracts these to fact types, expressed in a natural but formal language (e.g., Room at Time is booked for Activity). The populations of these fact types are constrained by business rules, which the modeler validates with the domain expert by

  1. verbalizing the rules, preferably in ORM's textual language (e.g., it is impossible that the same Room at the same Time is booked for more than one Activity), and
  2. populating the fact types with sample populations that satisfy the rules as well as with counter-examples that illustrate violations of the rules.

ORM differs from ER and UML in expressing all facts as sentences, where objects play roles (parts in relationships). ORM's graphical syntax depicts a sentence type as a logical predicate or sequence of one or more roles (shown as boxes), each connected to its object type (shown as a named ellipse). Fact types may be populated with tables of facts where each role corresponds to a column, facilitating validation by population. In contrast, attribute-based approaches such as ER and UML are problematic for verbalizing and multiply instantiating fact types other than simple binary relationships, and their use of attributes introduces null values.

In Part 1, only two kinds of rule were discussed: s imple mandatory and internal uniqueness constraints. A role is mandatory if it must be played by each instance in the population of its object type, and is depicted in ORM as a solid dot. A set of one or more roles in a predicate is unique if each entry for those role(s) in the corresponding fact table must be unique in the table's population. This constraint is depicted in ORM by an arrow-tipped bar over the role(s). For binary associations, a simple mandatory constraint is captured in UML using a minimum multiplicity constraint of 1 or more. For associations with n roles (n > 1), a uniqueness constraint over n or n -1 roles is captured in UML using a maximum multiplicity constraint of 1.

The remainder of this article illustrates some of the ways in which ORM provides a constraint notation that is simpler, more stable, more orthogonal, and yet more expressive than UML's for data modeling purposes. When you add these advantages to ORM's natural support of validation by verbalization and population, it makes a lot of sense to use ORM for conceptual analysis before designing a class diagram in UML. Since I did my doctorate many years ago formalizing ORM, you may suspect that I'm biased in this recommendation. Read on and decide for yourself.

Capturing constraints

There is no space in this short article to cover the wide range of rules that are supported graphically in ORM but not UML. However the following examples will illustrate the general idea, and the references provide a detailed coverage. Suppose that Table 1 is an example of a report that our information system is expected to output. You might like to try modeling this data use case yourself before reading on.

Table 1. A sample output report about Movies

 

Movie Director Reviewers

Nr

Title Name Born Name Born

1

Backdraft Ron Howard US Fred Bloggs US
Ann Green US

2

Crocodile Dundee Peter Faiman AU Ann Green US
Ima Viewer GB
Tom Sawme AU

3

Star Peace Ann Green US ? ?
... ... ... ... ... ...

One way of modeling this in UML is shown in Figure 1. Here movies are identified by a movie number. Assuming people can be identified simply by their name, Movie and Person classes may be used as shown. Since UML currently has no standard notation for primary identification, I've used my own notation "{P}" for this. Although the population of the sample report suggests that movie titles are unique, and that a person may direct at most one movie, let's assume that the domain expert confirms that this is not the case. As shown later, ORM facilitates the task of determining significant populations.

Figure 1. UML class diagram for Table 1

Figure 1. UML class diagram for Table 1

Instead of naming the associations between Movie and Person, the rolenames "director" and "reviewer" are used here to distinguish the two roles played by Person. Association names may be used as well as, or instead of, role names if desired. The multiplicity constraints indicate that each movie has exactly one director but may have many reviewers, and that a person may direct or review many movies.

But there is still a missing business rule. Can you spot it?

Figure 2 indicates how the same domain might be modeled in ORM. In addition to the schema, each fact type has been populated with the original facts. This population illustrates the mandatory role constraints (e.g., each movie has a director, but movie 3 has no reviewer). The original facts support the uniqueness constraint patterns shown on the review (m:n) and birth (n:1) fact types. These constraints are verbalized and approved by the domain expert.

However the original facts suggest that the title and director fact types are both 1:1. Unlike Figure 2, this would mean each role in these two associations has its own uniqueness constraint. These four uniqueness constraints are verbalized as:

  • each Movie has at most one MovieTitle;
  • each Movie was directed by at most one Person;
  • each MovieTitle applies to at most one Movie;
  • each Person directed at most one Movie.

The domain expert is likely to question the last two constraints. To confirm that these are not really constraints, we add counterexamples to violate them. These are shown in italics below the fact tables for the title and director associations. Here movies 3 and 4 have the same title and the same director. The domain expert agrees that this is possible, and the new facts are added to the sample population for inclusion in the business rule documentation.

Figure 2. Populated ORM schema for Table 1 with some counterexample checks

Figure 2. Populated ORM schema for Table 1 with some counterexample checks

The director and review associations are compatible (same object types), so it is meaningful to compare their populations. The ORM analysis procedure requires us to check whether a set-comparison constraint (subset, equality, or exclusion) must apply between them. This leads to the detection of an exclusion constraint, captured graphically here by the circled "X" constraint between the role-pairs making up the directed and reviewed associations. This verbalizes as:

  • no Person directed and reviewed the same Movie.

Or reading it the other way:

  • no Movie was directed by and was reviewed by the same Person.

This rule is verbalized and double-checked with the domain expert by providing a counter-example, shown in italics below the review table in Figure 2. The pair (1, Ron Howard) now appears in both the director and the review tables. Is it possible for Movie 1 to be directed by Ron Howard and also reviewed by Ron Howard? The domain expert denies this possibility, and the rule is added to the model.

Some domain experts are happy to work with diagrams and some are not. Some are good at understanding rules in natural language and some are not. But all domain experts are good at working with concrete examples. Although it is not necessary for the domain expert to see the diagram, being able to instantiate any role directly on the diagram makes it easy for you as a modeler to think clearly about the rules.

UML has no graphic symbol for an exclusion constraint, making it harder to think of the rule in analysis. If detected however, the rule can be documented in a note attached to the relevant associations, as shown in Figure 3. It can also be captured formally in the Object Constraint Language (OCL).[14]

However in spite of its good intentions OCL is too mathematical for validation with typical domain experts.[5] In contrast, an ORM tool such as Microsoft Visio Enterprise allows the rule to be verbalized and populated for validation, and also generates the appropriate DDL code to enforce the constraint when the conceptual schema is mapped to a database system.

Figure 3. UML diagram with exclusion constraint documented as a note

Figure 3. UML diagram with exclusion constraint documented as a note

UML does include one graphic constraint that partially captures the notion of exclusion. Its exclusive-or constraint is depicted by writing "{xor}" beside a dashed line connected to the relevant associations. Although the current wording of the UML specification describes the xor constraint as applying to a set of associations, it should instead have defined the constraint over a set of roles or association-ends to avoid ambiguity in cases with multiple common classes.

Visually this could be shown by attaching the dashed line near the relevant ends of the associations, as shown in Figure 4(a). The corresponding ORM diagram is shown in Figure 4(b), using a "lifebuoy" symbol for the constraint. In ORM, the constraint verbalizes as:

  • each Vehicle is leased from a Company or was purchased from a Company, but not both.

Figure 4a. An exclusive-or constraint in UML

Figure 4a. An exclusive-or constraint in UML

Figure 4b. An exclusive-or constraint in ORM

Figure 4b. An exclusive-or constraint in ORM

As the lifebuoy symbol suggests, an xor constraint in ORM is a combination of a mandatory constraint (solid dot) and an exclusion constraint (circled "X"). In ORM, a mandatory constraint can apply to one or more roles played by the same object type.

When two or more roles are involved, this is called a disjunctive mandatory role constraint or inclusive-or constraint (each instance of the object type must play at least one of these roles). An exclusion constraint can apply to a set of compatible role sequences. If there is only one role in each sequence, we have the simple case shown here. With the earlier movie example, there were two roles in each sequence. ORM constraints are legal wherever it makes sense to apply them, and they can be combined orthogonally (as in the xor case).

If the population in the earlier Figure 2 is significant, then each person referenced in the database must either direct a movie or review a movie. This inclusive-or constraint is depicted in ORM using a circled solid dot connected to the roles that are disjunctively mandatory (see Figure 5).

As an example of an exclusion constraint between simple roles, suppose that if a movie has its production canceled it is never sent out for review. This is depicted in Figure 5 by the exclusion constraint between the unary predicate "was cancelled" and the first role of Movie was reviewed by Person. This constraint verbalizes as:

  • no Movie was canceled and was reviewed by some Person.

Figure 5. Each person directed or reviewed; no movie was canceled and reviewed

Figure 5. Each person directed or reviewed; no movie was canceled and reviewed

UML does not support an inclusive-or constraint or an exclusion constraint. Moreover, it does not support unary fact types, so if we want to express the fact that a movie was canceled we need to binarize the unary (e.g., with a status association) or use a Boolean attribute or subtype. This is unnecessarily removed from the viewpoint of the domain expert, who simply wants to say a movie was canceled and see it that way in the model.

Unlike UML, ORM includes ring, join, and other constraints that often exist in the real world being modeled. As a trivial example of a ring constraint, the association Movie was based on Movie is irreflexive:

  • no Movie was based on itself.

As an example of join subset constraint, suppose we record the title (e.g., "Mr", "Mrs", "Dr") and sex of each person. Then there is an optional functional association from Title to Sex (Title determines Sex). For example, "Mr" applies only to males, while "Dr" applies to both males and females. So we should enforce the join subset constraint:

  • if Person has a Title that determines Sex then that Person is of that Sex.

In addition to greater expressibility, ORM models are more stable because they make no use of attributes. For example, suppose we started with the UML model in Figure 3, and then found out that we needed to record something about countries (e.g., their name or population). We are now forced to recast the birthCountry attribute as the association Person was born in Country.

In ORM, the original fact types remain unaltered, and we just add new ones. Although good UML modelers may often use associations instead of attributes for such cases, in principle any attribute could be affected in this way (e.g., consider the problem of formulating the join subset constraint above if we depicted title and sex as attributes). The greater semantic stability of ORM models also carries over into its query language. [2],[5]

 

Conclusion

*/ ?>

If you're a UML devotee, you probably feel I've been very unkind to UML. In fact I'm not trying to say "don't use UML." I think it's great for logical and physical design of object-oriented applications. I just don't think it's good for conceptual analysis of data requirements.

What I'm suggesting is that if you are going to work with class diagrams in UML, do yourself a favor and derive them from ORM models used in the analysis phase. That way you can get the benefits of fact-orientation getting a complete, stable, and correct data model that is easily validated with the domain experts, and still use UML for designing your object-oriented programs. The same goes for using ORM as a front-end to ER or other attribute-based modeling approaches for database design.

Further discussion on how ORM relates to UML can be found in [8], [10], and [11]. If you disagree with anything I've said, please feel free to email me at TerryHa@microsoft.com, but please include some argument to justify your viewpoint.

 


References:

*/ ?>

1. Booch, G., Rumbaugh, J. & Jacobson, I. 1999, The Unified Modeling Language User Guide, Addison-Wesley, Reading MA, USA.return to article

2. Bloesch, A. & Halpin, T. 1997, 'Conceptual queries using ConQuer-II,' Proc. 16th Int. Conf. on Conceptual Modeling ER'97 (Los Angeles), D. Embley, R. Goldstein eds, Springer LNCS 1331 (Nov.) 113-126. *return to article

3. Halpin, T. 1995, Conceptual Schema and Relational Database Design, 2nd edition (revised 1999), WytLytPub, Bellevue WA, USA.return to article

4. Halpin. T. 1995, 'Subtyping: conceptual and logical issues,' Database Newsletter, ed. R.G. Ross, Database Research Group Inc., vol. 23, no. 6, pp. 3-9.*

5. Halpin, T. 1998, 'Conceptual Queries,' Database Newsletter, vol. 26, no. 2, ed. R.G. Ross, Database Research Group, Inc., Boston MA (March/April 1998).*return to article

6. Halpin, T. 1998, 'Object Role Modeling (ORM/NIAM),' Handbook on Architectures of Information Systems, P. Bernus, K. Mertins & G. Schmidt eds, Springer-Verlag, Berlin, pp. 81-101.*return to article

7. Halpin, T. 1998, Object Role Modeling: an Overview.*return to article

8. Halpin, T. 1998-9, 'UML data models from an ORM perspective: Parts 1-10,' Journal of Conceptual Modeling, InConcept, Minneapolis USA.*return to article

9. Halpin, T. 1999, 'Fact-orientation before object-orientation (part 1): The Case for Data Use Cases,' DataToKnowledge Newsletter, vol. 27, no. 6, (Nov./Dec. 1999).return to article

10. Halpin, T.A. 2000, 'Integrating fact-oriented modeling with object-oriented modeling,' Information Modeling for the new Millenium, eds K. Siau & M. Rossi (in press).return to article

11. Halpin, T. & Bloesch, A. 1999, 'Data modeling in UML and ORM: a comparison,' Journal of Database Management, vol. 10, no. 4, Idea group Publishing Company, Hershey, USA, pp. 4-13.*return to article

12. OMG UML Revision Task Force, OMG Unified Modeling Language Specification, version 1.3, available online from http://uml.shl.com/artifacts.htm.return to article

13. Rumbaugh, J., Jacobson, I. & Booch, G. 1999, The Unified Modeling Language Reference Manual, Addison-Wesley, Reading MA, USA.return to article

14. Warmer, J. & Kleppe, A. 1999, The Object Constraint Language: precise modeling with UML, Addison-Wesley, Reading MA, USA.return to article

* accessible online at www.orm.net

Standard citation for this article:


citations icon
Terry Halpin, "Fact-orientation before Object-orientation (Part 2): Capturing Constraints with Object Role Modeling" Business Rules Journal, Vol. 1, No. 7, (Jul. 2000)
URL: http://www.brcommunity.com/a2000/b023.html

About our Contributor:


Terry   Halpin
Terry Halpin Professor of Computer Science, INTI International University (Malaysia)

Dr. Terry Halpin, BSc, DipEd, BA, MLitStud, PhD, is a Professor of Computer Science at INTI International University, Malaysia, and a data modeling consultant. His prior industrial background includes many years of research and development of data modeling technology at Asymetrix Corporation, InfoModelers Inc., Visio Corporation, Microsoft Corporation, and LogicBlox. His previous academic background includes many years teaching computer science at the University of Queensland (Australia) and Neumont University (USA). His current research focuses on conceptual modeling and conceptual query technology. His doctoral thesis formalized Object-Role Modeling (ORM/NIAM), and his publications include over 200 technical papers and seven books, including Information Modeling and Relational Databases, 2nd Edition (2008: Morgan Kaufmann). Dr. Halpin may be reached directly at t.halpin@live.com.

Read All Articles by Terry Halpin

Online Interactive Training Series

In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.