Fact-Orientation before Object-Orientation (Part 1): The Case for Data Use Cases

Terry   Halpin
Terry Halpin Professor of Computer Science, INTI International University (Malaysia) Read Author Bio || Read All Articles by Terry Halpin

Although semantic approaches to information systems modeling appeared in the early seventies, no single approach has yet achieved widespread adoption.  By and large, the history of information systems modeling has been characterized by a plethora of techniques and notations, with occasional religious wars between proponents of different approaches.  Each year, dozens of new approaches would be proposed, leading to groans from the academic community who were charged with teaching the state-of-the-art.  This is sometimes referred to as the "yama" (Yet Another Modeling Approach!) or "nama" (Not Another Modeling Approach!) syndrome.

I often picture this as a mountain of modeling methods, piled on top of one another, which nicely ties in with the Japanese meaning of  "yama" (mountain), depicted as a kanji that is high in the middle and low on the ends (see Figure 1).

Figure 1.  “YAMA” (Yet Another Modeling Approach) or Japanese for “Mountain”

While diversity is often useful, clearly the modeling industry would benefit if practitioners would agree to use a small number of standard modeling approaches, individually suited for their modeling scope, and collectively covering the tasks needed to model a wide variety of practical applications.  This would improve communication between modelers and reduce training costs, especially in an industry with high turn-around of employees.

Recently, the rapid rise of UML (Unified Modeling Language) has been accompanied by claims that UML by itself is an adequate approach for modeling any software application.  Such claims have been rejected by some experienced modelers, including David Hay who argues that "there is no such thing as 'object-oriented analysis',"[13] only object-oriented design, and that "UML is … not suitable for analyzing business requirements in cooperation with business people."[14]

The UML notation includes a vast number of symbols, from which various diagrams may be constructed to model different perspectives of an application.  The main diagram types are Use Case diagrams, Static Structure diagrams (Class diagram, Object diagram), Behavior diagrams (Statechart, Activity diagram), Interaction diagrams (Sequence diagram, Collaboration diagram), and Implementation diagrams (Component diagram, Deployment diagram).

In my opinion, some of these diagrams (e.g., collaboration diagrams) are useful only for designing object-oriented program code.  Some (e.g., activity diagrams and use case diagrams) can be quite useful in requirements analysis.  And some (e.g., class diagrams) have limited use for conceptual analysis and are best used for logical design.  There is no space in this short article to cover UML in depth, so let's focus on data modeling, restricting our discussion of UML to its class and object diagrams.

While I agree with David Hay that UML class diagrams are less than ideal for data modeling, I feel that his preferred ER notation shares some of UML's weaknesses in being attribute-based.  In case you didn't know, I'm one of those weird guys who never use attributes in conceptual analysis.  I think attributes are great for logical design, since they allow compact diagrams that directly represent the data structures (e.g., relations or object-relations) used for the actual design.

However, when I'm performing conceptual analysis, I just want to know what the facts and rules are about the business, and I want to communicate this information in sentences, so that the model can be understood by the subject matter experts.  I sure don't want to bother about how facts are grouped into structures.  Whether some fact will end up in the design as an attribute is not a conceptual issue.  As Ron Ross says, "Sponsors of business rule projects must sign off on the sentences — not on graphical data models.  Most methodologies and CASE tools have this more or less backwards."[16, p.15]

For a database or a programming application, if you use a fact-oriented approach such as ORM (Object Role Modeling) to develop the conceptual data model, this makes it easier to get it right in the first place and to make changes later as the application evolves.  ORM is so-called because it views the world as a set of objects that play roles (parts in relationships).  Overviews of ORM may be found in [6] [8] [9]  and a detailed treatment in [4].  Another attribute-free approach is described in [3].  It is easy to transform an ORM model into your preferred notation (e.g., UML, ER, relational) on the way to implementation.

UML facilitates object-oriented code design because it covers both data and behavioral modeling, and lets you drill down into physical design details relevant to OO-code.  Using a class diagram, for example, you can declare whether an attribute is private, public, or protected, what operations are encapsulated in an object, and whether an association can be navigated in one direction only.

For specifying the logical design of a database, UML class diagrams offer no major benefits over traditional database design notations.  UML has no standard notation for candidate keys or foreign key relationships, but you can add your own notations for this until some standard notation eventuates.  While UML is currently used almost exclusively for code design, I expect it to gain wide acceptance for database design in the future.

In the remainder of this article, I'll discuss some reasons why I believe that to fully exploit UML (or ER for that matter) you should supplement it with a fact-oriented approach such as ORM.

Data Use Cases

Any modeling method comprises a notation as well as a procedure to help modelers use the notation to construct models.  The UML specification[15] is almost entirely concerned with the syntax and semantics of the language notation.  Its only significant procedural advice is to adopt a use-case driven, iterative approach.  Use cases in UML illustrate ways in which the required information system may be used and hence are useful in requirements analysis.  However, because of their focus on behavioral modeling, they can only go so far in helping the modeler arrive at a data model.

In practice, the move from a use case to a data model is often little more than a black art.  To seed the data model in a scientific way, we need examples of the kinds of data that the system is expected to manage.  In ORM these examples are traditionally referred to as "information samples familiar to the subject matter expert."  By analogy with the UML term, we call them data use cases.  They can be output reports or input screens or forms, and since they exist at the external level they can present information in many different ways (tables, forms, graphs, etc.).

Whatever the appearance of the data use cases, a subject matter expert familiar with their meaning should be able to verbalize the information contained in them in terms of natural language sentences.  It is the modeler's responsibility to transform that informal verbalization into a formal yet natural verbalization that is clearly understood by the domain expert.

These two verbalizations, one by the domain expert transformed into one by the modeler, comprise steps 0.5 and step 1 of ORM's conceptual analysis procedure.  Here we use verbalization of populations to arrive at the fact instances that are then abstracted to fact types.  Constraints and derivation rules are meta-facts (facts about the object facts), which are then added and themselves validated by verbalization and population.

To get a feeling of how this works in ORM, suppose that our system is required to output reports like that shown in Table 1.  We ask the domain expert to read off the information contained in the table and then rephrase this in formal English.  For example, the subject matter expert might read off the facts on the top row of the table as follows:  "Room 20 at 9 a.m. Monday is used for the activity 'VMC' which has the name 'VisioModeler class'."

Table 1.  A Simple Data Use Case

Room

Time ActivityCode ActivityName

20

Mon. 9 a.m. VMC VisioModeler class

20

Tue. 2 p.m. VMC VisioModeler class

33

Mon. 9 a.m. AQD ActiveQuery class

33

Fri. 5 p.m. SP Staff party

As modelers, we rephrase this information into two elementary sentences, identifying each object by a definite description: 

  • the Room numbered '20' at the Time with day-hour-code 'Mon. 9 a.m.' is used for the Activity coded 'VMC';
  • the Activity coded 'VMC' has the ActivityName 'VisioModeler class'

Once the domain expert agrees with this verbalization, we proceed to abstract from the fact instances to the fact types.  We might then lay out this structure on an ORM diagram and populate it with sample data as shown in Figure 2.

Figure 2. Populated ORM schema for Table 1 with some counterexample checks

Figure 2.  ORM schema for Table 1, with sample population and counter-examples

Entity types are shown as named ellipses and must have a reference scheme, i.e., a way for humans to refer to instances of that type.  Simple reference schemes may be shown in parenthesis (e.g., "(nr)"), as an abbreviation of the relevant injective association, e.g., Room has RoomNr.  Value types need no reference scheme and are shown as named, dashed ellipses (e.g., ActivityName).

An association is shown as a named sequence of one or more role boxes, each connected to the object type that plays it.  Here we have one ternary association, Room at Time is used for Activity, and one binary association Activity has ActivityName.  ORM allows associations of any arity (number of roles).  Each n-ary association (n > 0) may be given n readings, one starting at each role.  For a binary association, forward and reverse predicates may be shown separated by a slash "/".

As in logic, a predicate is a sentence with object holes in it.  Mixfix notation is used, so the object terms may be mixed in with the predicate at various positions (as required in many languages such as Japanese).  An object placeholder is explicitly indicated by an ellipsis "".  For instance, the ternary predicate shown is "… at … is used for …". For unary postfix predicates (e.g., "… smokes") or binary infix predicates (e.g., "… has …") the ellipses may be elided.

For each association (or fact type), a fact table may be added with a sample population to help validate the constraints.  Visio Enterprise allows the display of fact tables to be toggled on/off and can induce constraints from sample data.  Each column in a fact table is associated with one role.  The arrow-tipped bars are internal uniqueness constraints, indicating which roles or role combinations must have unique entries.  ORM schemas can be represented in either diagrammatic or textual form, and tools can automatically transform between the two representations.  Models are validated with subject matter experts in two ways:  verbalization and population.

For example, the uniqueness constraints on the ternary association verbalize as:

  • a Room at a Time is used for at most one Activity;
  • at most one Room at a Time is used for an Activity.

The ternary fact table provides a satisfying population (each Room-Time combination is unique, and each Time-Activity combination is unique).  The uniqueness constraints on the binary verbalize as:

  • each Activity has at most one ActivityName;
  • each ActivityName refers to at most one Activity.

The 1:1 nature of this association is illustrated by the population, where each column is unique.  The black dot on Activity is a mandatory role constraint, indicating that each instance in the population of Activity must play that role.  This verbalizes as  each Activity has at least one ActivityName.  A role that is not mandatory is optional.

Since sample data are not always significant, additional data (like Y2K in the name fact type) may be needed to help illustrate some rules.  The optionality of the other role played by Activity is shown by the absence of Y2K in its population.  Since ORM schemas can be specified in unambiguous sentences backed up by illustrative examples, it is not necessary for domain experts to understand the diagram notation at all.  The modeler, however, finds the diagram a major aid to thinking about the universe of discourse.

To double-check a constraint, a counter-example to the constraint being investigated may be presented.  The counter-rows marked "?" below the fact tables test the uniqueness constraints.  For instance, the first row and counter-row of the ternary indicate that room 20 at 9 a.m. Monday is used for both the VMC and AQD activities.

Giving concrete examples makes it easier for many domain experts to decide whether something really is a rule.  The first row and second counter-row indicate that both room 20 and room 33 are used at 9 a.m. Monday for the VMC activity.  Is this kind of thing possible?  If it is (and for some application domains it would be) then this constraint is not a rule — in which case, the constraint should be dropped and the counter-row added to the sample data.  On the other hand, if our business does not allow two rooms to be used at the same time for the same activity, then the constraint is validated and the counter-example is rejected (though it can be retained merely as an illustrative counter-example).

A UML class diagram equivalent to the ORM conceptual schema is shown in Figure 3.  While a ternary association is used, an activityName attribute is used instead of a binary association.  Because of its object-oriented focus, UML does not require conceptual identification schemes for its classes (instances are assumed to be identified by oids) and does not even include a standard notation for saying that attribute values must be unique for their class.  However, UML does allow user-defined constraints to be added in braces or notes in whatever language users wish.  So, I've added {P} to denote primary uniqueness and {U1} for an alternate uniqueness.  The uniqueness constraints on the ternary are captured by the two 0..1 (at most one) multiplicity constraints.  The "*" means "0 or more".

Figure 3.  UML class diagram for Table 1

If the ORM or UML schema is mapped to a relational schema, the ternary association will map to a separate table, and a choice is needed as to which of Room-Time or Activity-Time will form the basis of the primary key.  Clearly, this choice is an implementation decision and has nothing to do with conceptual analysis (although binary-only versions of ER force this upon us).  It is possible to extend UML to depict the relational schema itself, (e.g., introduce table as a stereotype of class and the foreign key relationship as a special kind of dependency), but the issue at hand is whether UML is well suited for the conceptual model, not the logical model.

The UML model in Figure 3 seems to be the best we can do to model the application conceptually.  But how well does it support validation of the model with the domain expert, who should not be expected to be familiar with the graphical notation?

Let's start with verbalization.  Although often less than ideal, implicit use of "has" could be used to form binary sentences from the attributes.  But what about the ternary?  About the best we can do is something like "Usage involves Room and Time and Activity" — which is pretty useless.  What if we replaced the association name with a mixfix predicate, as we did in ORM, e.g., "… at … is used for …"?

This is no use, because UML association roles (or association ends as they are now called) are not ordered.  So, formally we can't know if we should read the sentence type as "Room at Time is used for Activity" or "Activity at Time is used for Room," etc.  This gets worse if the same class plays more than one role in the association (e.g., Person introduced Person to Person).  UML does allow association roles to have names (ORM allows this also, although the role names are not normally displayed on the diagram), but this doesn't help either because role names don't form sentences, which are always ordered in natural language.

UML's weakness with regard to verbalization of facts carries over into its verbalization of constraints and derivation rules.  It does suggest that OCL (Object Constraint Language[18]) be used for formal expression of such rules, but despite its claims, OCL is simply too mathematical in nature to be used for validation by non-technical domain experts.

Since verbalization in UML has inadequate support, let's try validation with sample populations.  Not much luck here either.

To begin with, attribute-based notations are almost useless for multiple instantiation, and they introduce null values into base populations, along with all their confusing properties.  UML does provide object-diagrams that enable you to talk about attributed single instances of classes, but that doesn't help with multiple instantiation.  For example, the 1:1 nature of the association between activity codes and names is transparent in the ORM fact table (see Figure 2) but is much harder to see by scanning several activity objects.

In principle, we could introduce fact tables to multiply instantiate binary associations in UML, but this wouldn't work for ternary and longer associations.  Why not?  Because UML associations are not ordered.  So, there is no obvious connection between a role and a fact column as there is in ORM.  Even if we added role names as headers to the fact table, the visual connection to the class diagram would often be awkward because of the non-linear layout of the association roles.  The higher the arity of the association, the worse it gets.  So, sample populations are only of limited use in UML.

Conclusion

Unlike UML (and ER for that matter), ORM was built from a linguistic basis, and its graphic notation was carefully chosen to exploit the potential of sample populations.  If you want to reap the benefits of verbalization and population for communication with (and validation by) domain experts, you are better off using a language that was designed with this in mind.  The ORM notation is simple and can be learned in a fraction of the time it takes to master just a fragment of UML.  Of course, you can always map the ORM model to a UML or ER model at a later stage.

In a future article, I will show that, in spite of its simplicity, the ORM notation is significantly richer than UML in its capacity to express constraints in a conceptual data model, as well as being far more orthogonal and less impacted by change.  Some background to justify these claims can be found in [10] [11] [12].  As discussed elsewhere, many of the benefits of ORM's conceptual modeling language also carry over to its conceptual query language [1] [2] [7].

References

[1]  Bloesch, A. & Halpin, T.  1996, "ConQuer: a conceptual query language," Proc. 15th International Conference on Conceptual Modeling ER'96 (Cottbus, Germany), B. Thalheim ed., Springer LNCS 1157 (Oct.) 121-133.*  return to article

[2]  Bloesch, A. & Halpin, T.  1997, "Conceptual queries using ConQuer-II," Proc. 16th Int. Conf. on Conceptual Modeling ER'97 (Los Angeles), D. Embley, R. Goldstein eds, Springer LNCS 1331 (Nov.) 113-126.*  return to article

[3]  Embley, D. 1998, Object Database Management, Addison-Wesley.  return to article

[4]  Halpin, T. 1995, Conceptual Schema and Relational Database Design, 2nd edn (revised 1999), WytLytPub, Bellevue WA, USA.  return to article

[5]  Halpin. T.A. 1995, "Subtyping:  Conceptual and Logical Issues," Database Newsletter, ed. R.G. Ross, Database Research Group Inc., vol. 23, no. 6, pp. 3-9.*  

[6]  Halpin, T.A. 1996, "Business Rules and Object-Role Modeling," Database Programming and Design, vol. 9, no. 10 (Oct. 1996), pp. 66-72.*  return to article

[7]  Halpin, T.A. 1998, "Conceptual Queries," Database Newsletter, vol. 26, no. 2, ed. R.G. Ross, Database Research Group, Inc., Boston MA (March/April 1998).*  return to article

[8]  Halpin, T. 1998, "Object Role Modeling (ORM/NIAM)," Handbook on Architectures of Information Systems, P. Bernus, K. Mertins & G. Schmidt eds, Springer-Verlag, Berlin, pp. 81-101.*  return to article

[9]  Halpin, T. 1998, Object Role Modeling:  An Overview.*  return to article

[10]  Halpin, T.A. 1998-9, "UML data models from an ORM perspective:  Parts 1-10," Journal of Conceptual Modeling, InConcept, Minneapolis USA.*  return to article

[11]  Halpin, T.A. 1999, "Data Modeling in UML and ORM revisited," Proc. EMMSAD'99: 4th IFIP WG8.1 Int. Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, Heidelberg, Germany (June).*  return to article

[12]  Halpin, T.A. & Bloesch, A.C. 1999, "Data modeling in UML and ORM: a comparison', Journal of Database Management, vol. 10, no. 4, Idea group Publishing Company, Hershey, USA, pp. 4-13.*  return to article

[13]  Hay, D.C. 1999, 'There is No Object-Oriented Analysis," DataToKnowledge Newsletter, Vol. 27, No. 1, Business Rule Solutions, Inc., Houston TX, USA.  return to article

[14]  Hay, D.C. 1999, "Object Orientation and Information Engineering:  UML," The Data Administration Newsletter, no. 9, (June 1999), ed. R.S. Reiner, available online as article 5242 at www.tdan.com.  return to article

[15]  OMG UML Revision Task Force, OMG Unified Modeling Language Specification, version 1.3, available online from http://uml.shl.com/artifacts.htm  return to article

[16]  Ross, R.G. 1998, Business Rule Concepts, Business Rule Solutions, Inc., Houston TX, USA.  return to article

[17]  Rumbaugh, J., Jacobson, I. & Booch, G. 1999, The Unified Modeling Language Reference Manual, Addison-Wesley, Reading MA, USA.  

[18]  Warmer, J. & Kleppe, A. 1999, The Object Constraint Language:  Precise Modeling with UML, Addison-Wesley, Reading MA, USA.  return to article

* accessible online at www.orm.net

# # #

Standard citation for this article:


citations icon
Terry Halpin, "Fact-Orientation before Object-Orientation (Part 1): The Case for Data Use Cases" (Nov./Dec. 1999)
URL: http://www.brcommunity.com/a1999/a430.html

About our Contributor:


Terry   Halpin
Terry Halpin Professor of Computer Science, INTI International University (Malaysia)

Dr. Terry Halpin, BSc, DipEd, BA, MLitStud, PhD, is a Professor of Computer Science at INTI International University, Malaysia, and a data modeling consultant. His prior industrial background includes many years of research and development of data modeling technology at Asymetrix Corporation, InfoModelers Inc., Visio Corporation, Microsoft Corporation, and LogicBlox. His previous academic background includes many years teaching computer science at the University of Queensland (Australia) and Neumont University (USA). His current research focuses on conceptual modeling and conceptual query technology. His doctoral thesis formalized Object-Role Modeling (ORM/NIAM), and his publications include over 200 technical papers and seven books, including Information Modeling and Relational Databases, 2nd Edition (2008: Morgan Kaufmann). Dr. Halpin may be reached directly at t.halpin@live.com.

Read All Articles by Terry Halpin
Subscribe to the eBRJ Newsletter
CONTRIBUTOR ARCHIVES
Logical Data Modeling (Part 14)
Logical Data Modeling (Part 13)
Logical Data Modeling (Part 12)
Logical Data Modeling (Part 11)
Logical Data Modeling (Part 10)
In The Spotlight
 Ronald G. Ross
 Jim  Sinur

Online Interactive Training Series

In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.