SBVR and MDA
The Object Management Group (OMG) recently approved the Semantics of Business Vocabulary and Business Rules (SBVR) to become a final adopted specification of the OMG. SBVR is a landmark for the OMG, the first OMG specification to incorporate the formal use of natural language in modeling and the first to provide explicitly a model of formal logic. Based on a fusion of linguistics, logic, and computer science, and two years in preparation, SBVR provides a way to capture specifications in natural language and represent them in formal logic so they can be machine-processed. SBVR is expected to become an integral part of the OMG's Model-Driven Architecture® (MDA®). This article introduces SBVR as part of MDA by explaining MDA in terms of SBVR.
SBVR contains a vocabulary for conceptual modeling, which is a vocabulary to formally specify representations of concepts, definitions, instances, and rules of a knowledge domain in natural language, and to capture these as formal logic structures. These features make SBVR well suited for describing business domains and requirements for information systems to implement business models. There is a pressing need for formal methods to capture system requirements in natural language, where requirements are best authored and validated by domain experts, for subsequent incorporation into the system development automation stream.
Terms found throughout this article in boldface have definitions in the Semantics of Business Vocabulary and Business Rules specification, OMG document bei/05-08-01, often recited herein. The use of singular construction in the article is deliberate and is also characteristic of SBVR expressions.
Conceptualization and Representation
"MDA is an approach to system development ... [that] ... provides a means for using models to direct the course of understanding, design, construction, deployment, operation, maintenance and modification."
MDA modeling is a formal approach for carefully describing a part of the world and systematically translating the description from English or other natural language through a succession of representations in various MDA modeling languages and programming languages to a representation that is held in an information system. To understand MDA, it is instructive to begin by examining how language works as we think and communicate about the world.
Consider first the concept 'concept':
concept : unit of knowledge created by a unique combination of characteristics
A concept exists only in the mind of a person who is thinking about it. A particular concept corresponds to a thing or things in the world. By 'thing' is meant anything perceivable or conceivable. A thing is anything we can perceive through our senses (or convert to sensations with instruments) or anything we can think about. Which things correspond to a particular concept is determined by the definition of the concept.
We often give a name to a concept by associating a symbol with the concept, such as a written or spoken word or phrase or icon. We use this word or phrase or icon as a representation of the concept when we communicate about the concept through speech or writing. For example, we see a thing that has the characteristics of being a road vehicle with four wheels, powered by an internal-combustion engine, and able to carry a small number of people, and we conceptualize it as a certain kind of thing. We use the symbol 'automobile' to represent the concept (in English), and we say, "that is an automobile." Thus, for each concept there is a trinity of 1) the concept in our minds, 2) the real-world things conceptualized by the concept, and 3) a representation of the concept that we can use to think and communicate about the concept. Note that a concept may have many different representations, in the same language or different languages.
People with whom we want to communicate about this concept must also associate the symbol 'automobile' with the same concept we have in mind, or there will be a miscommunication. Communicating parties must share the concept, in their minds, and use a shared representation in their communication, in order to understand and to be understood.
Machines are often used as intermediaries in communication or surrogates for people in interactions with other people. Machines do not hold concepts, because a machine does not have a mind. However, communicating machines must hold semantically-equivalent representations of the concepts and must manipulate each representation in relation to the representations of other concepts. The machine representations and manipulations must be consistent with how people understand the concepts and how they understand that the things conceptualized by the concepts are manipulated in the real world.
IT development involves defining machine representations for concepts and defining interactions of the representations that parallels the real world. Specifying machine representations and their interactions that correctly correspond to the world is where modeling and MDA come in.
SBVR provides a natural starting point for the MDA process, by using the natural language that is the basis for thought and person-to-person communications for modeling what people have in mind. With SBVR, the language does not get in the way of domain experts modeling their domain; they are not required to think about implementations of their ideas. With SBVR, they get to focus on expressing their ideas, as complex as they may be.
Extensions and Instances
Some concept may have only one thing in the world that corresponds to it (e.g., The Statue of Liberty). Some other concept may have many things that correspond to it (e.g., automobile). The set of all things that are conceptualized by a particular concept is called the extension of the concept. For example, the extension of the concept 'automobile' is the set of all automobiles. In SBVR, this set can be specified to include only automobiles that are explicitly declared in the domain model, or it can be left open to include any automobile at all, depending on the intended meaning in a particular model.
An individual thing that is in the extension of a concept is called an instance of the concept. In object-oriented analysis and programming, an 'object' is a representation of an instance. Sometimes the term 'instance' is used to refer to a representation of an instance, which can cause confusion.
The instances of some concepts are individual things, such as an automobile, which is an instance of the concept 'automobile'. The instances of other concepts are themselves concepts, such as the concept 'type of vehicle', of which the concept 'automobile' is an instance.
Concepts whose instances are individuals are known as 'first order concepts'. Concepts whose instances are concepts are known as 'higher order concepts'. There is no theoretical limit as to the number of conceptual levels that a concept can be removed from an individual. For example, the taxonomy of zoology has seven conceptual levels: kingdom, phylum, class, order, family, genus, and species. An individual creature is an instance of a species, e.g., a person is an instance of the species Homo sapiens; we say that a person is a Homo sapiens. A person is not an instance of any genus, but rather the concept that is the species 'Homo sapiens' is an instance of the concept that is the genus 'Homo', and we say 'Homo sapiens is a Homo'.
In MDA, the number of conceptual levels that a concept is removed from individual is called the 'meta level' of the concept or its representation, often designated M-0 (individual), M-1, M-2, or M-3. As we shall see below, the distinction between first-order and higher-order plays an important role in the use of a model. Beyond the first-order/higher-order distinction, however, the absolute conceptual level of a concept is relatively unimportant.
Conceptual Models, Conceptual Schemas, and Possible Worlds
A conceptual model is a representation of an instance that describes the instance in terms of concepts and facts . A conceptual model is thus a definition of an instance, which can be an arbitrarily complex thing, such as a business. Because of the assumed complexity of a modeled instance, the instance that is the subject of a conceptual model is often called a possible world. A conceptual model is a combination of a conceptual schema and a set of facts. The conceptual schema is a combination of concepts and facts of what is possible, necessary, permissible, and obligatory in each possible world.
The set of facts describes the individuals that exist in a possible world, and their characteristics. Each individual is an instance of some concept of the conceptual schema. Each fact involves concepts of the conceptual schema. That is, the set of facts defines what is being modeled in terms of the conceptual schema.
The extension of a conceptual schema is the set of all possible worlds that are consistent with the conceptual schema. The set of facts specifies one possible world from this set. The actual world is a special case of a possible world wherein each fact in the set of facts is an actuality -- there is an instance in the actual world that corresponds to each fact.
Each necessity in the conceptual schema is satisfied by the conceptual model, but the obligations are not necessarily satisfied. 'Necessity' and 'obligation' are categories of 'rule'.
Necessities define structural relationships in each possible world, and are definitional in nature. Obligations define duties of a certain party or parties to do a certain act. Since people (and machines) do not always do what they are supposed to do, the obligations are not necessarily satisfied in a particular world (possibly including the actual world).
Facts of a Conceptual Model
There are two kinds of facts in the set of facts of a conceptual model. An existential fact is used simply to assert the existence of something, "There is an automobile that has license number 'ABC-123'." An elementary fact is a declaration that an instance has a property, "The automobile having license number 'ABC-123' is blue," or that one or more instances participate in a relationship, "The automobile having license number 'ABC-123' is owned by the person named 'Stan Hendryx'." An elementary fact cannot be split into simpler facts with the same instances without information loss. An elementary fact may be treated as an instantiation of an irreducible relation, called a fact type, which is a concept whose instances are all facts.
Each fact has at least one role. A role is a concept (other than a fact type) that corresponds to things based on their playing a part, assuming a function or being used in some situation being described by a fact. The number of roles in a fact type is called the 'arity' of the fact type. SBVR supports fact types of any arity, including unary (1 role), binary (2), ternary (3), or n-ary (n roles). Most elementary fact types have 3 or fewer roles.
A fact type reading is a representation of a fact type. Multiple fact type readings are typical. Different fact type readings use different verbs and possibly differ in their ordering of the individuals, but are considered to express the same fact if they mean the same. Thus, 'automobile is owned by person' and 'person owns automobile' are two readings of the same fact type. These nominal restrictive forms that refer to one of the instances or the other are also readings of this same fact type because they refer to the same relation: 'person owning automobile' and 'automobile owned by person'. Such forms are common in rules.
Categories of Conceptual Models
It is convenient to have a categorization scheme for conceptual models that is based on the kinds of concepts that appear in the conceptual schema of the conceptual model, whether the concepts are all first-order concepts, or whether there are higher-order concepts, and if so, the range of possible higher-order concepts that is permitted by the conceptual schema.
A conceptual model whose conceptual schema contains only first-order concepts, that is, contains only concepts whose instances are individuals, is called in MDA an 'M-0 model' (or 'implementation' or 'data'). M-0 denotes that the set of facts declares only individuals.
A conceptual model whose conceptual schema contains higher order concepts whose instances are first-order concepts is called in MDA an 'M-1 model' (or simply, 'model'). MDA and MOF also historically defined an M-2 model as a 'metamodel', and an M-3 model as a 'metametamodel', each level having a conceptual schema one conceptual level higher than the previous. In this scheme, the model designation refers to the meta-level of the set of facts of the model. The designations M-0, M-1, M-2, and M-3, and the former restriction to four meta levels, are being deprecated in MDA, based on the realization that there is no inherent limit on the number of levels and no need for MDA to distinguish categorically among higher-order models.
Abstract Languages and Syntax
A language can be defined by a conceptual model. In MDA, a conceptual model that declares concepts in its set of facts is called an 'abstract language'. For example, if a conceptual model declares in its set of facts "There is an automobile" (concept of 'automobile' is introduced) the model would be defining an abstract language that includes 'automobile' in its vocabulary. Since such a model specifies a vocabulary, it is a model of a language, which is a method of communication consisting of the use of structured representations. The language is called 'abstract' because only the structural elements of the language are specified in the language definition model, e.g., 'Class', 'Association', 'Property' (in UML). The set of concepts introduced, including the facts that associate and constrain them, is called the 'abstract syntax' of the language. It is typical in MDA to specify the concrete syntax, or notation, separately from the abstract syntax of a language.
Separating the abstract and concrete syntax specifications of a language leaves open the possibility for alternative concrete syntaxes for the language. Concrete syntax can be a graphical notation convention, as in case of a graphical language like UML, or a textual grammar, as in the case of a textual language like SBVR. Of course, some concrete syntax is needed to express abstract syntax. In MDA, UML notation is used for this purpose for graphical models. The abstract syntax of SBVR is expressed in the SBVR specification itself using a notation called 'SBVR Structured English', which is a controlled subset of ordinary English grammar. The terms shown in boldface in this document represent concepts in the abstract syntax of SBVR; only a few of the 400+ concepts in the abstract syntax of SBVR are illustrated in this article.
In MDA, one language can be defined more or less in terms of another, depending on the extent the language being defined refers to concepts in the language being used. Defining one abstract language in terms of another is called 'meta modeling' in MDA. A language definition is a conceptual model in which the language being used comprises the conceptual schema and the language being defined comprises the set of facts. The conceptual schema of a model that defines a language is the abstract syntax of the language being used. As such, the language being used is the 'meta-language' of the language being defined.
It is very important when discussing abstract languages to keep clearly in mind the point of view of the discussion, whether the definition of the language is being discussed, or whether a use of the language is being discussed. When the definition of a language is discussed, the language is in the set of facts. When a use of a language is being discussed, the language is in the conceptual schema. Much confusion and miscommunication results when these points of view are crossed. This confusion is so common that there is an informal term for it in the OMG: 'meta-model-muddle'.
MOF is the common meta-language among all MDA languages. MOF is also the meta-language of choice for the object-oriented MDA languages, particularly the UML and its derivatives. SBVR is the meta-language of choice for natural language models in MDA. There is a MOF model of SBVR so that SBVR models can be structurally linked with other MDA models based on MOF at the level of individual facts. SBVR models can be semantically linked to other models through vocabulary representation.
What, one might ask, is the meta-language of MOF and of SBVR? It turns out each of these is its own meta-language. MOF and SBVR are each defined in terms of itself. In order for a language to be able to describe itself, it must be a 'reflective language'. That is, there must be some category of 'concept' in the language that appears (is 'reflected') in its own meta-language.
An abstract language that includes the fact type 'concept has instance' (or equivalently, 'concept is of instance') in its abstract syntax allows instances to discover what kind of thing they are. The ability of an instance to discover information about its own structure and context is called 'introspection'. This feature is called 'reflection' in MDA languages, and a language that has this feature is called a 'reflective language'.
Reflection is vital to MDA, since transformations need to be able to determine the concept(s) and characteristics of an instance. Reflection is also important in forming and answering conceptual queries, such as, "What automobile is owned by what person?" A reflective language can be defined in terms of itself. MOF and SBVR are each a reflective language, and therefore each can be used for meta-modeling.
The category of 'concept' involved in MOF is 'Class', and an instance of Class is an Object. The fact type 'Object is of Class' is found in the package MOF::Reflection, in the form of the function 'Object.getMetaClass()'. UML and all other languages that are defined as instances of MOF inherit MOF::Reflection and have the capability for introspection.
Trade-off between Expressivity and Ease of Use
Note that the more specific the meaning of a concept of an abstract language is to the meaning of a concept that a modeler wants to include, the fewer facts will be needed to complete the model. A 'universal' modeling language needs only 'concept', 'instance', and 'relation' to model any thing, but everything about it needs to be in the model. With more specific concepts in the abstract language, the expressivity is more restricted, i.e., the smaller is the set of possible worlds the language can describe, but the easier it is to describe them.
This fundamental inverse relation between expressivity and ease of use leads people to build special-purpose languages for domains they want to model extensively, in order to reduce the modeling effort and make their models easier to interpret by domain experts. The challenge is then to link the domains for communication.
It is often necessary that individuals in a conceptual model be uniquely identified. To accomplish identification, a set of fact types, called a reference scheme, can be specified for a concept such that an individual in the extension of the concept is uniquely identified by the facts asserted in the reference scheme.
In the examples above, the reference scheme for 'automobile' is the fact types 'automobile has license number' and 'license number is issued by state', i.e., an automobile is uniquely identified by it license number and issuing state. A concept can have multiple reference schemes, e.g., vehicle identification number (VIN) for automobile, in addition to license number and issuing state.
Reference schemes should represent stable properties of instances so an instance can continue to be identified even if some of its other characteristics change. Parties must agree on the reference scheme used in communication about individuals so they can each correctly identify the individuals.
Transactions change a Model
An enterprise information system may hold an M-0 model of an enterprise. It is the aim of most enterprise information systems that the model held by the information system is of the actual world. An enterprise often expends considerable resources to maintain its actual world model.
The conceptual schema of the M-0 model is represented by the database schema, the system's software object model, and the system's representations of business rules. The set of facts is represented by the data in the database. The possible world represented by the conceptual model in the information system changes and becomes a different possible world each time a fact is added, negated, or deleted from the model. That is, each transaction of the enterprise produces a new conceptual model, representing a different possible world of the enterprise, and, objectively, the changing actual world.
The conceptual schema itself can change, which changes the set of possible worlds of the enterprise model. A change to the conceptual schema may result in the facts also changing, since the necessities must always be satisfied. Change to the conceptual schema may result in some obligations not previously satisfied being satisfied (if rules were loosened), or vice versa (if rules were tightened).
MDA General Purpose Modeling Languages
An abstract language whose conceptual schema includes the concept 'concept' is considered a general purpose language because it can be used to declare any concept. The only general purpose abstract language of MDA is SBVR; 'concept' in SBVR is unconstrained.
The conceptual schemas of the Unified Modeling Language™ (UML™) and the Meta Object Facility™ (MOF™) contain the concept 'Class' that is a category of concept that is delimited from 'concept' by the characteristics and constraints specified in the respective language specifications. In these languages, an instance of Class declares, in terms of the specification of Class in the language, the required and permitted structure of corresponding Objects in the possible worlds modeled by these languages. Thus, UML and MOF are quasi-general purpose languages, subject to their object-oriented constraints.
EMOF (Essential MOF) is very simple, having only a few concepts in its conceptual schema, notably 'Class' and 'Property', and a few others. 'Association' is notably absent from EMOF. 'Property' in EMOF can model binary fact types, and only binary fact types, through the contrivance of two opposite Properties on what would be considered to be the ends of a binary association. CMOF (Complete MOF) adds 'Association' and some other concepts that make it easier to model databases and related languages.
UML adds many other concepts to its conceptual schema, particularly some that make it easier to model behavior. The expressivity of each language, i.e., the possible worlds it can describe, is limited by the concepts it incorporates and the combinations of these concepts that it permits.
MOF is designed to define other modeling languages and to serve as a hub in transformations between languages based on MOF. The two dialects of MOF -- EMOF and CMOF -- are designed to provide a lightweight language for relatively simple applications, and a more robust language for more demanding applications, respectively.
There is a core model that is common to MOF and UML, to facilitate transformations between models based on either UML or MOF. The UML is designed as a general purpose object-oriented modeling language. The UML has provisions for modeling both object structures and behavior in a few different paradigms.
SBVR is for modeling in natural language. Based on linguistics and formal logic, SBVR provides a way to represent statements in controlled natural languages as logic structures called semantic formulations. SBVR is intended for expressing business vocabulary and business rules, and for specifying business requirements for information systems in natural language. SBVR has the greatest expressivity of any OMG modeling language.
The logics supported by SBVR are typed first order predicate logic with equality, restricted higher order logic (Henkin semantics), restricted deontic and alethic modal logic, set theory with bag comprehension, and mathematics. SBVR also includes projections (to support specifying definitions and answers to queries) and questions (for formulating queries). Interpretation of SBVR semantic formulations is based on model theory. SBVR has a MOF model, so SBVR models can also be represented in MOF and be interoperable with other MOF-based models.
MDA Special-Purpose Modeling Languages
The OMG has defined several special-purpose abstract languages in addition to UML, MOF, and SBVR. An OMG language that specializes UML is called a 'UML Profile'. A language that specializes MOF is called a 'metamodel' or a 'MOF model'.
Some of these special purpose languages include the Common Warehouse Metamodel, and UML Profiles for CORBA® CORBA Component Model (CCM); Enterprise Application Integration (EAI); Enterprise Distributed Object Computing (EDOC); Quality of Service and Fault Tolerance; Schedulability, Performance, and Time; and the UML Testing Profile. Other special purpose languages are works in progress in the OMG. Many of these refer to only parts of the UML metamodel, sometimes only 'Class'. Semantic linking in such cases is based on vocabulary representation.
As discussed above, it is possible and convenient to conceptualize many different abstract languages for different purposes. It is also possible and useful, to transform one or more conceptual models (the source) to one or more different conceptual models (the target), where the target may be based on the same or different abstract languages from those of the source so that the target includes a different representation of the source, provided certain conditions on the transformation are met.
The most fundamental criterion for such a transformation is that it is what mathematicians call 'functional'. This means that any representation or particular combination of representations in the source model can have only one representation or combination of representations in the target model. Because of the uniqueness of the function, the meaning of the representation in the target model can be said to be well defined in terms of the source, i.e., each target representation can be said to represent a concept that is a specific function of certain source concepts. Not all representations in a source need be present in a target, and conversely. Information is usually filtered from the source or added in the target.
In a chain of such transformations, the meaning of each mapped representation can be interpreted in terms of an ultimate source conceptual model. If that ultimate source conceptual model is a model of the business domain in English or other natural language, it can be validated to have a well-defined business meaning by business personnel who are domain experts. It is often desirable to design transformations so the function preserves the semantics of the transformed representations individually. Sometimes it is desirable that the function be reversible, bidirectional. A rigorous mathematical account of the theory of model transformations in MDA remains to be developed.
There is a special category of transformation in MDA that is noteworthy. These are the transformations between a conceptual model and a presentation of the model in graphical form. These transformations must be one-to-one. Each notation element on a diagram corresponds to a particular set of facts in the conceptual model. Since there are alternative notations available in some cases, the reverse mapping is not unique. In mathematics, this kind of mapping is called an 'injective morphism'. If the notation is unique, then the mapping is an isomorphism, also known as a 'bijective morphism'.
An overview account of MDA has been provided in terms of SBVR, giving also a demonstration of the expressivity and natural language modeling approach with SBVR. A coherent and logically consistent account of any knowledge domain can be developed in natural language using SBVR and be subjected to MDA transformations. SBVR semantic formulations provide a basis in formal logic for analyzing and processing SBVR models. Availability of interoperable machine-readable representations of SBVR natural language models in MOF/XMI bridges the gap between the world of thought and the technology world. SBVR provides a natural starting point for the MDA process, by using the natural language that is the basis for thought and person-to-person communications for modeling what people have in mind.
Hendryx & Associates lead the organization of the team that created SBVR, and is a co-submitter of the specification. To learn more about how you can use SBVR, contact Stan Hendryx, firstname.lastname@example.org.
 A representation is a portrayal of a meaning by an expression, which is something that expresses or communicates, but that is independent of its interpretation, e.g., the sequence of characters 'automobile'.
# # #