Writing Natural Language Rule Statements — a Systematic Approach: Part 1 — Basic Principles

Graham   Witt
Graham Witt Consultant / Author Read Author Bio || Read All Articles by Graham Witt
About this series of articles

While my first series of articles on writing natural language rule statements[1] explored a wide variety of issues in a rather organic and hence random manner, this series will take a more holistic and systematic approach and draw on insights gained while writing my recently-published book on the same topic.[2]

Why natural language?

To effectively answer[3] a question of the form "Why x?" — which is, of course, shorthand for "Why use x rather than some alternative?" — one needs to establish:

  1. What are the other alternatives?

  2. What are the advantages and disadvantages of each alternative?

The principal alternatives to natural language are languages designed for human/machine communication, such as:

  1. programming languages, e.g., the following Java statement:
    if ( Unit_Count <= 0 )
          { System.out.println("Unit count must be positive"); }
    else if ( Unit_Count > 9999 )
         { System.out.println("Unit count must not be more than 9999"); }


  2. database constraint languages, e.g., the following DDL (data definition language) clause:
    Unit_Count NUMBER(4)
         CONSTRAINT CH_Unit_Count CHECK (Unit_Count > 0 AND Unit_Count < 9999)


  3. rule engine input notations:  these typically consist of programming language-style statements, tabular arrangements of information, or graphical notations.

One disadvantage shared by all of these alternatives is that they require specialist knowledge to interpret.  Another disadvantage of using any of these languages or notations is that, in many modern systems of any complexity, some rules need to be implemented on multiple platforms, each of which requires a specific language.  For example, a rule governing input data may need to be implemented in an internal user interface, a web form, and/or an XML parser (if data can be input from other systems), as well as in a rule engine and/or database.  Inevitably in such situations each rule is expressed in each required programming language without any one expression being marked as the "source of truth."

It is also quite common to encounter, in specifications designed for human rather than machine consumption, pseudo-code,[4] such as
            if unit count <= 0 or > 9999 then display error message.
This tends to reflect the overall syntax of programming language statements but relaxes punctuation requirements and/or the need for data item names not to include spaces.  However, there are no particular standards for pseudo-code and, while programmers will have no trouble understanding such statements, other stakeholders may less easily understand them.

Remember that, for effective rule management, the rule statements expressing an organisation's rules need to be understood by a variety of stakeholders:

  1. the organisation's management, responsible for compliance with legislation and regulations, minimization of exposure to risk, cost reduction, revenue protection, maintenance of market share, etc., and hence for framing the rules that help the organisation achieve those objectives;

  2. developers of systems in which those rules are to be implemented;

  3. authors of training and help materials for those systems;

  4. employees, customers, suppliers, and members of the public governed by those rules;

  5. auditors and regulatory bodies responsible for checking compliance with legislation and regulations.

Why a constrained natural language?

Actually, every natural language is constrained, in that:

  1. not every pronounceable combination of letters is a word in that language, thus leaving plenty of scope for humourists, innovators and the branding industry to come up with words like 'jabberwocky', 'iPad', and 'Prius';

  2. there are rules about which words and forms of words can be associated with each other and in which order, such as the rules violated by "*a aviation are grew significant inside the quarter previous";[5]

  3. utterances can be syntactically correct but semantically nonsensical — e.g., "Colorless green ideas sleep furiously"— composed by Noam Chomsky in 1957 as an example of such an utterance.

Be that as it may, when we use the term constrained natural language, we mean that:

  1. it uses only words from a specified, relatively-limited set or sets of words in some defined natural language;

  2. it uses only specified forms of phrase and clause, again from a relatively-limited set;

  3. it only uses them in specified combinations.

The specific constraints governing the constrained natural language that I use for rule statements — and the implications of those constraints — will become clear as you read these articles.

You may be familiar with the language used for dialogue between flight crews and air traffic controllers, sometimes referred to as 'Aviation English'.  This uses a very limited subset of English words and a very limited subset of sentence forms.  However, unlike the constrained natural language that I use for rule statements, it also uses various words and phrases with meanings that differ from the meanings of those words and phrases in general use, such as 'affirmative', 'over', 'Roger', and even phrases not encountered in general use, such as 'pan pan'.

Various authors have developed different approaches to expressing rules in natural language; each author has recognized that different types of rule require rule statements with different sentence patterns:

  1. Terry Halpin has produced a considerable body of work on natural language description of ORM models (including constraints) since the early 1990s.[6]

  2. At the 1999 Entity-Relationship conference in Paris, France, I delivered a paper describing the technique by which business rules were modelled as natural language statements in a project to develop a school administration system.[7]

  3. Ron Ross developed version 1.0 of the language RuleSpeak[8] in 2001; this has been subsequently updated.

  4. In 2002, Tony Morgan published a set of syntactic templates for the construction of rule statements.[9]

  5. In 2004, I described an approach to describing a data model and its associated rules using natural language assertions.[10] [11]

  6. In January 2008 the Object Management Group released version 1.0 of the Semantics of Business Vocabulary and Business Rules,[12] a comprehensive analysis of the linguistic and logical concepts underlying natural language rule statements.

As far as I have been able to establish, in the latest version of the language specified in each of these resources, all allowable sentences:

  1. are grammatically valid sentences in US or UK English;

  2. have the same meanings as in general discourse in English (unlike, as previously observed, various terms in 'Aviation English').

To illustrate the value of using a constrained natural language rather than the natural language on which it is based (e.g., US or UK English), consider an example of one of the most common types of rule, one that requires that a value be provided for a particular data item in a particular transaction type.  Such rules can be stated in many different ways; here are just a few:

RS1. †The Birth Date of each Passenger must be specified in a Travel Insurance Application.[13]
RS2. †It is obligatory that the Birth Date of each Passenger be specified in a Travel Insurance Application.
RS3. †It is obligatory that a Travel Insurance Application specify the Birth Date of each Passenger.
RS4. †A Travel Insurance Application is obliged to specify the Birth Date of each Passenger.
RS5. †A Travel Insurance Application must specify the Birth Date of each Passenger.
RS6. †For each Passenger a Travel Insurance Application must specify the Birth Date.
RS7. †A Travel Insurance Application must for each Passenger specify the Birth Date.
RS8. †A Travel Insurance Application must specify each Passenger's Birth Date.
RS9. †Each Travel Insurance Application must specify the Birth Date of each Passenger.
RS10. †A Travel Insurance Application must specify exactly one Birth Date for each Passenger.
RS11. †Travel Insurance Applications must specify the Birth Date of each Passenger.

When you also consider that alternative terms can be used, e.g., 'Date of Birth' instead of 'Birth Date' or 'Customer' instead of 'Passenger', and 'stated' or 'filled in' can be used instead of the verb 'specify', this one rule can be expressed in even more ways.

Why should this matter?  In a typical organisation, there will be many such rules:  those governing other data items in this transaction, and those governing data items in other transactions.  If the various rules of this type are expressed in different ways, it becomes much more difficult and error-prone to establish whether any two rules duplicate each other or (worse) conflict with each other.  Furthermore translation of natural language rule statements into statements in the appropriate programming language, database constraint language, and/or rule engine input language, whether automatic or manual, is also much more difficult and error-prone.

To make these tasks easier, an organisation should use a constrained natural language, i.e.:

  1. agree on the terms to be used for all entity types, event types, transaction types, data items, etc.;
  2. agree on the verbs to be used between particular pairs of terms;
  3. agree on standard formulations for phrases, clauses and sentences for each type of rule statement.

While decisions in the first two categories are relatively easy for business stakeholders to make and agree on, decisions in the third category require rather more thought, particularly as some formulations are better than others.  This series of articles will focus mainly on which phrases and clauses are most appropriate in each situation and how best to assemble those phrases and clauses into clear unambiguous rule statements, although choice of terms and verbs will also be discussed.

Principles of natural language rule statement construction

The choices of particular formulations for a rule statement and the phrases and clauses that make it up are governed by a number of principles, some of which are now discussed.

The subject of a rule statement

It is important that each rule statement makes clear which set of objects is to be tested by the rule.  In the case of the rule under consideration in this article, that is the set of Travel Insurance Applications.  Travel Insurance Applications that specify a Birth Date for each Passenger comply with the rule, whereas Travel Insurance Applications that do not specify a Birth Date for each Passenger do not comply with the rule.  Passenger Birth Dates are not being tested by this rule:  those that occur in some other transaction type, document, or other context are outside the scope of this rule, whereas those that occur in Travel Insurance Applications ensure that those Travel Insurance Applications comply with the rule.

The set of objects to be tested by the rule will be clear to readers of the rule statement if the term signifying that set is the subject of (and thus the initial noun phrase in) the main clause (or only clause) of the rule statement expressing that rule.

If we look at rule statements RS1 to RS3 in the list above, we see that none of them has the correct subject 'Travel Insurance Application'.  RS1 has 'Birth Date' as the subject while RS2 and RS3 have 'It' as the subject of the main clause (with 'Birth Date' and 'Travel Insurance Application' respectively as the subject of the subordinate clause).  Each of the other rule statements in the list above has 'Travel Insurance Application' as the subject of the only clause.

Succinctness

While RS4 is an improvement on RS3 in that it has 'Travel Insurance Application' rather than 'It' as the subject of the main or only clause, RS5 — in which 'is obliged to' is replaced by 'must' — is shorter than RS4 by two words.  Provided there is no loss of meaning, it is generally good practice to use fewer words.

Placement of the prepositional phrase

Rule statements RS6 and RS7 differ from RS5 only in:

  1. placement of the prepositional phrase 'for each Passenger';

  2. the fact that 'of each passenger' is an allowable alternative only when the prepositional phrase follows 'Birth Date'.

Of these three, RS5 is to be preferred since:

  1. the initial placement in RS6 obscures the fact that 'Travel Insurance Application' is the subject of the rule statement (in English the subject generally precedes the verb in a clause so generally appears at the start of a sentence);

  2. the placement in RS7 separates 'must' and the verb it qualifies ('specify'), obscuring what it is that the members of the subject set (i.e., Travel Insurance Applications) are obligated to comply with.

A problem with the possessive form

Rule statement RS8 uses the possessive form "each Passenger's" rather than a prepositional phrase.  However use of this form can lead to ambiguity.

Consider an employee timesheet in which employees record hours worked on various projects and optionally record expenses incurred while working on those projects.  While RS12 makes clear that Expenses are optional, this is not so clear in RS13.  For this reason, even though the possessive form saves two words, it is not recommended.

RS12. An Employee Timesheet must specify the Amount of each Expense (if any).
RS13. †An Employee Timesheet must specify each Expense's Amount (if any).

Use of determiners

Rule statement RS9 uses a different determiner[14] ('Each' rather than 'A') in front of 'Travel Insurance Application'.  This makes clear that the rule applies to every Travel Insurance Application rather than just one in particular.

Rule statement RS10 uses the determiner 'exactly one' rather than 'the' in front of 'Birth Date'.  This makes clear that this rule would be violated not only by Travel Insurance Applications omitting a Birth Date from one or more passengers but those that specify more than one Birth Date for at least one passenger.

Singular or plural?

I often encounter rule statements with plural subjects, such as RS11 and RS14.  The main problem with these is that, if the predicate[15] contains a cardinal number[16] such as 'one' (as does RS14) it is not always clear what number applies to an individual object to be tested, nor whether there is some constraint across the entire set of objects to be tested.  For example, I have observed stakeholders inferring from a rule statement such as RS14 that all tested objects (in this case Travel Insurance Applications) must specify the same Insurance Company (rather than exactly one each).

RS14. †Travel Insurance Applications must specify exactly one Insurance Company.

The ideal form

So what is the ideal form for a rule statement to express the rule that RS1 to RS11 attempted to express?  RS15 takes into account all the principles discussed above.

RS15. Each Travel Insurance Application must specify exactly one Birth Date for each Passenger.

The font and colour conventions used in RS15 reflect those in the SBVR, namely:  underlined teal for terms, italic blue for verb phrases, orange for keywords, and double-underlined green for names and other literals.  Note that, for clarity, these conventions are not used for rule statements that exhibit one or more non-recommended characteristics.

To be continued...
The next article in this series will discuss mandatory data rules[17] (such as RS15) in more detail, looking at the different formulations that are required in various situations.

References

[1]  The first of which is:  Graham Witt, "A Practical Method of Developing Natural Language Rule Statements (Part 1)," Business Rules Journal, Vol. 10, No. 2 (Feb. 2009), URL:  http://www.BRCommunity.com/a2009/b461.html  return to article

[2]  Graham Witt, Writing Effective Business Rules.  Morgan Kaufmann (2012).  return to article

[3]  I am aware that I have here violated the "rule" prohibiting "split infinitives" but this "rule" is frequently violated these days without loss of meaning or precision.  return to article

[4]  A description of the computational logic to be used in a system that uses one or more conventions of a programming language, but is intended for comprehension by a human being rather than a system.  return to article

[5]  Syntactically incorrect constructions are conventionally indicated by way of an initial asterisk.  return to article

[6]  These include:

  • Halpin and Harding, "Automated support for verbalization of conceptual schemas," Proceedings of the 4th Workshop on Next Generation CASE Tools.  Paris, France:  Twente Memoranda Informatica (1993);
  • Terry Halpin, Information Modeling and Relational Databases.  Morgan Kaufman (2001);
  • a series of articles, the first of which is:  Terry Halpin, "Verbalizing Business Rules (Part 1)," Business Rules Journal, Vol. 4, No. 4 (April, 2003). URL:  http://www.BRCommunity.com/a2003/b138.html.  return to article

[7]  Graham Witt, "Modelling Business Rules for School Student Administration:  a Case Study," ER99.  Paris, France (1999).  return to article

[8]  Ronald G. Ross and Gladys S. Lam, RuleSpeak Sentence Templates — Developing Rule Statements Using Sentence Patterns.  BRSolutions LLC (2001).  return to article

[9]  Tony Morgan, Business Rules and Information Systems.  Indianapolis, USA:  Addison-Wesley (2002).  return to article

[10]  A statement about a particular artefact in a data model.  return to article

[11]  Graeme Simsion and Graham Witt, Data Modeling Essentials, Third Edition.  Morgan Kaufmann (2004).  return to article

[12]  Semantics of Business Vocabulary and Business Rules (SBVR), v1.0.  Object Management Group (Jan. 2008).  Available at http://www.omg.org/spec/SBVR/1.0/PDF.  return to article

[13]  Each rule statement example that does not comply with the constrained natural language described in this series of articles is marked with an initial dagger.  return to article

[14]  A word or phrase used before a noun to provide some information as to which instance (or instances) of the noun's concept are being referred to, such as 'the', 'my', 'his'.  return to article

[15]  A phrase expressing a state or condition that may or may not be true of each associated subject, e.g., "is a citizen of the US" is true of "Ron Ross" but not true of "Graham Witt".  return to article

[16]  Any of the numbers 'one', 'two', 'three', etc.  return to article

[17]  A rule that mandates the presence of data — i.e., requires that data be entered in a transaction form or present in a message — or that a persistent data record include a value for certain data.  return to article

# # #

Standard citation for this article:


citations icon
Graham Witt , "Writing Natural Language Rule Statements — a Systematic Approach: Part 1 — Basic Principles" Business Rules Journal Vol. 13, No. 7, (Jul. 2012)
URL: http://www.brcommunity.com/a2012/b660.html

About our Contributor:


Graham   Witt
Graham Witt Consultant / Author

Graham Witt has over 30 years of experience in assisting organisations to acquire relevant and effective IT solutions. NSW clients include the Department of Lands, Sydney Water, and WorkCover while Victorian clients include the Departments of Sustainability & Environment, Education & Early Childhood Development, and Human Services. Graham previously headed the information management and business rules practice in Ajilon's Sydney (Australia) office.

Graham has developed specialist expertise in business requirements, architectures, information management, user interface design, data modelling, relational database design, data quality, business rules, and the use of metadata repositories & CASE tools. He has also provided data modelling, database design, and business rules training to various clients including NAB, Telstra, British Columbia Government, and ASIC and in the form of public courses run by Simsion Bowles and Associates (Australia) and DebTech (USA).

He is the co-author, with Graeme Simsion, of the widely-used textbook "Data Modeling Essentials" and is the author of the newly published book, "Writing Effective Business Rules" (published by Elsevier). Graham has presented at conferences in Australia, the US, the UK, and France. Contact him at gwitt@pacific.net.au.

Read All Articles by Graham Witt

Online Interactive Training Series

In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.