Natural Language, Semiotics, SBVR, ORM, and CQL
When I utter a word, that word has some meaning for me, and I hope that when you hear it, the word will invoke a corresponding meaning in your mind. The science of semiotics studies the way symbols (such as words) relate to meanings, and yields insight into the nature of meaning itself. Its fundamental insight, shared by Object Role Modeling and the Semantics of Business Vocabulary and Business Rules (SBVR), is that the meaning of any symbol exists only in the relationships of that symbol to other symbols. Our mental representation of elements of the real world is understood to be symbolic rather than necessarily linguistic, though that's a fine distinction on which there isn't complete agreement. We learn to associate words that correspond to those symbols, but that's a different kind of association.
When reading the popular semiotic novel "The Name of the Rose" by Umberto Eco, I came across this curious expression: "Words are symbols that speak to us of symbols that speak to us of things." At first I thought it must have been a typographical error, a duplication, but in fact it contains a central idea of semiotics. Our ideas of the world exist apart from the words that describe them, in the configuration or oscillation of groups of neurons. Each idea has associated words, but the ideas themselves are associated by symbolic linkages that are non-lexical. These symbols are related to, but don't share identity with, the words that invoke them. Whether this picture is correct or not, SBVR, ORM, and other fact-oriented approaches are modeled on this separation of words (as symbols) from the concepts that they describe.
In order for my description of a situation to be meaningful to you, I must first expect that you can mentally represent that situation, in other words, that you are capable of understanding the situation. Next, I must assume that enough words exist in our respective vocabularies for the situation to be described, and finally that when I use a word, you will associate that word with your mental image of the same real-world thing. We share the same concepts and enough of the same words for those concepts.
To unpack these three requirements further, the first one requires that each element (thing or relationship) of the situation is associated with a corresponding representation (a symbol) in your mind. Next, that each such symbol is associated with (has a correspondence to) at least one word or manner of expression in the language I will use. Finally, that you will correctly interpret my words, by applying the verbal correspondences to invoke the correct mental symbols, the symbols that you associate with the same real-world thing that I mean. Once these three requirements are established, it's up to me to choose words that will correctly communicate what I want to say. It is through this tangled web of symbol associations and correspondences that communication can occur.
In natural language, every word is interpreted in the context of the other words in the surrounding context. There is no word that can stand alone having an intrinsic meaning. This is what makes natural language so fluid and difficult to process or pin down. In SVBR and other fact-oriented approaches, additional restrictions apply — within a defined vocabulary, a noun may only have one meaning. In the fact type 'Person runs Company', the words Person and Company each mean exactly one concept. On the other hand, other linking words such as the verb 'runs' may have more than one meaning. It's allowed to state that 'Person runs Race', where the verb has a different meaning than before. It is allowed because the verb is used in a different expression (in ORM, a different fact type reading).
All forms of fact-oriented modeling support nominalisation, where a fact type receives its own designation as a noun. An example of this is the nominalisation of the fact type 'Person directs Company' as 'Directorship'. Once again, the modeling approaches require that nouns created through such nominalizations have exactly one meaning within a given vocabulary. What has happened is that the fact type has first been objectified (considered as an object type), and then nominalized (the object type has been named).
In Object Role Modeling, all fact types that are ternary or higher-order are nominalized implicitly if not explicitly, as are all binary fact types that don't embody a functional relationship. So, for example, the binary relationship 'Person has visited Country' is objectified, since a person may visit more than one country and a country may be visited by more than one person. This may be objectified as an 'international visit'. Where there is no explicit nominalization such as 'Visit', an ORM tool will create a name such as PersonVisitedCountry from the fact type readings. The nominalisation is necessary because a database table will be created having that name, so it's normally preferable to make the nominalisation explicit, for example, naming the table simply 'Visit'.
In natural language, the word 'visit' may carry many meanings, not just the idea of international travel. If during my travels, I visit family, my familial visit can be described using the same word. Not so in existing fact-oriented approaches. Where nominalisation occurs, the new noun must carry only one meaning. This requires the creation of compound or aggregate nouns, such as InternationalVisit and FamilialVisit, RentalAgreement, and the like. It's not yet clear whether or how the restriction to a single meaning per noun could or should be relaxed, but it's noted that this is the most significant area where fact-oriented modeling languages differ from natural language. Perhaps the restriction is needed for the rigour that is sought, but it's possible it could be relaxed somewhat.
Object Role Modeling provides some assistance when fact types are implicitly nominalised, in the form of adjectives. Instead of having to say 'Person lived at ResidentialAddress' and 'Person worked at BusinessAddress', I can say 'Person lived at residential- Address'. This avoids the unnecessary creation of two new nouns for the different uses of Address. An explicit nominalisation of this fact type might be 'Habitation', but if objectification is not required, such as where a uniqueness constraint allows us to record only one residential address for each person, ORM tools like NORMA will compound the adjectives to create a meaningful column name such as ResidentialAddress in the Person table. The domain of this column will be Address, not a new domain called ResidentialAddress.
The space after the hyphen is required to reduce ambiguity with words that are normally hyphenated, as in 'Person drove semi-trailer to Location'. Hyphen binding may also occur following a noun, which allows natural support for languages like French where the adjectives follow the noun. In this case the hyphen must occur directly before the adjective. Where more than one adjective occurs, the hyphen attaches to the one furthest from the noun, as in the fact type 'Region has anticipated- demand Amount for Product in Period'. This method for identification of adjectival roles in fact type readings also allows natural verbalisation of constraints, as in 'Region has exactly one anticipated demand Amount for Product in Period'.
In the Constellation Query Language (CQL), which I'm defining as part of the ActiveFacts project, the same general approach as ORM2 is taken. An extended introduction to the Constellation Query Language is here: http://dataconstellation.com/ActiveFacts/CQLIntroduction.pdf. CQL is capable of representing any ORM2 model, and an export tool is available to generate CQL from NORMA.
Because CQL is a purely textual language, some slight differences from ORM are necessary. CQL statements must be parsed, whereas in NORMA, fact type readings contain linking words between special symbols for the concepts that play the fact roles. I elected not to introduce special lexical constructs (like '[Person] lives at [Address]') in order to stay closer to the appearance of natural language. The most obvious implication of that decision is that both Person and Address must already have been defined; forward referencing isn't possible, and turns out not to be necessary. In turn, the use of a space around an adjective's hyphen isn't necessary, because the noun concept to which it applies will be identifiable anyway.
CQL also allows a vocabulary to import another vocabulary, declaring correspondences and equivalences between imported and local concepts. This solves the 'I say Pot-a-to, you say Pot-ar-to' problem, and allows the same vocabulary to be represented in many languages. Vocabularies can be constructed of many small parts and, even when the parts are constructed in isolation with conflicting terminology, can be merged into a shared vocabulary with overlaps and aliasing where necessary. Do we do business with Clients, or Customers? It doesn't matter, as long as each noun has a single meaning within the limited vocabulary it spans, and when the vocabularies are used together, any aliasing, conflicts, or correspondences are noted. Correspondences in CQL can only be declared between nouns, which makes it difficult to handle correspondence between fact types that aren't nominalised.
Since CQL, like NORMA, is designed to be used not just for semantic modeling, but also for the generation of executable code, this ability to compose merged vocabularies has significant implications in supporting software re-use.
 ... except sometimes in the case of defining the identification scheme for a concept. In the CQL meta-model, a Vocabulary contains named Features, and each Feature is identified by a Name and, optionally, by the Vocabulary to which it belongs. Vocabulary is a subtype of Feature (which means that a Vocabulary can contain another Vocabulary), and that means that the identification patterns for Feature and Vocabulary form a loop. This is resolved by allowing forward references in the list of identifying roles: 'Feature is identified by Vocabulary and Name' is legal CQL, even though Vocabulary hasn't yet been defined. This type of pattern doesn't occur very often in real scenarios, however.
(c) Copyright 2008 Clifford Heath.
# # #