The Four Dimensions of Semantic Quality: A Business View of Data and Data Quality

Ronald G.  Ross
Ronald G. Ross Co-Founder & Principal, Business Rule Solutions, LLC , Executive Editor, Business Rules Journal and Co-Chair, Building Business Capability (BBC) Read Author Bio       || Read All Articles by Ronald G. Ross

Extracted from Business Knowledge Blueprints: Enabling Your Data to Speak the Language of the Business, by Ronald G. Ross, 2020, 288 pp, http://brsolutions.com/business-knowledge-blueprints.html

The central flaw in the long-running discussion over data quality is literally its focus on 'data'. Stored data is merely the system or database residue of things that have already happened in the business, a memory of past events.

To truly fix 'data quality' problems requires a business perspective, a shift in the focus from data design or data cleansing, to what occurs in the business itself. Our sights should be trained squarely on the business activity that results in the data.

Creating Data: More To It Than You Think

Consider what workers are doing when they create a piece of data (whether through some app or otherwise). In the business world, of course, they're probably just doing a bit of work. Look more closely, however, and in some ways what they're doing is actually quite profound. Think about it this way:

Recipients of the message might be only just milliseconds away — but they also might be weeks, months or even years away. Data isn't just data; it's an effort to communicate.

Normally we think of communication as either direct conversation or (in the spirit of the times) a flurry of text messages exchanged more or less in real time with people we know. In either case there's usually a shared context within which the meaning of the messages can be interpreted, as well as more or less real-time exchange of clarifications.

What's distinct about creating data is that you're almost certainly not going to be face-to-face with the recipients of the message or connected live with them via an interactive network. That fact rules out body language (e.g., raised eyebrows or emoticons 😊 ) and dialog (including grunts and groans — or more emoticons 😉 ) to clarify what you mean. In that sense the communication is blind, as illustrated in Figure 1.

Figure 1. The Act of Creating Data as a Blind Business Communication to People in the Future.

As a consequence, the data a worker creates literally needs to speak for itself. The emphasis needs to be on the effectiveness of communication — that is, on semantic quality.

Semantic quality focuses on whether the meaning of a message is clear. Just formatting data correctly doesn't get you there. If the meaning isn't clear a business communication won't be properly understood. In other words, you need clarity for the concepts communicated — not just data quality.

The Role of Data/System Architectures and What Data Quality is Really About

Because of the time delay in delivering blind communications to everyone in the future who might need them, a secure, well-organized holding area is needed. IT professionals, hopefully guided by knowledgeable data architects, create data/system architectures for that purpose.

Unfortunately, typical data quality measures in current use focus on the health of the content of the data/system architecture rather than on the semantic quality of the original business messages. That focus serves a purpose for data management but misses the mark almost entirely in clarifying what practices produce good business communications in the first place. Typical data quality dimensions (e.g., completeness, uniqueness, timeliness, etc.) are:

  • Retroactive rather than proactive
  • Quantitative rather than qualitative
  • Systemic rather than semantic

Worst of all, typical data quality dimensions implicitly remove responsibility off the shoulders of those who create the data.

The quality of data in a data/system architecture can never be any better than the quality of the business communications that produced it. A systematic means to manage data at rest simply does not guarantee the vitality — the semantic health — of the business communications it supports. Sometimes IT professionals focus so intently on software development the importance of the point escapes them. (Many data professionals do understand the point, but do not know quite how to articulate it or feel powerless to do much about it.)

To make the point differently, it is entirely possible to assess your data quality as outstanding even though the business communications that produced the data were confusing, contradictory, unintelligible, or otherwise ineffective. Rating data quality high when communication is poor is nonsense!

Forming High-Quality Business Communications

Rather than retroactively focusing on data already formed, business people and professionals need proactive measures to form high-quality messages in the first place — no matter whether structured data or written business communication ('unstructured data').

What should the recipients of blind messages expect? They have the right to expect:

  1. High-quality evidence about what the content means.
  2. No need for any significant assumptions, whether unconscious or deliberate, to supplement that evidence.
  3. The content representing exactly the reality the evidence suggests.

What form does evidence available to recipients take?

  • names, codes, and words
  • definitions
  • business vocabulary
  • business rules

The four dimensions of semantic quality presented in Table 1 arise directly from these four kinds of evidence, respectively. They provide the context for blind communications. They apply equally to structured data and to written business communication ('unstructured data').

Table 1. The Four Dimensions of Semantic Quality

The four dimensions of semantic quality are discussed individually below with examples. They might seem largely self-evident, but there is more to them than initially meets the eye.

Readability

A readable message is one that is not encoded or cryptic (unintentionally); that is, one whose meaning is not obscured by choice of signifiers (names, codes, or words). If a message is encrypted (as security of course usually demands these days) the encryption should be on top of the message, not an accidental by-product of forming the message (data) itself.

Cryptic names and codes are rampant in IT systems; they are encouraged by programming languages, software platforms, and legacy computer tradecraft. Some typical examples:

  1. PID-RAD2-TYPE. Who but programmers might know what that name represents?

  2. A coding scheme for the values of a field where '0' stands for 'no' and '1' stands for 'yes'. Why?!

  3. The abbreviation 'PT'. Without adequate evidence, this abbreviation could stand for many things, including the following:[1]
    • PT Emp → Part-time employee
    • PTCRSR → PT Cruiser (Personal Transportation Cruiser)
    • Blk pt chassis → Black platinum chassis
    • 24pt bk → Manual published in 24-point type
    • 2 pt asbl → Two-part assembly
    • 1 pt → One pint
    • LIS PT → Lisbon, Portugal

How you name things should always be based on natural-language ways of communicating about the things. Inadequate or misleading names, or ones that could easily be misconstrued, should be carefully avoided.

In subject matter of any complexity — which is to say virtually all business subject matter — word choice can make a huge difference in the ultimate effectiveness of a communication. There is simply no name like exactly the right name.

Understandability

An understandable message uses only terms with solid business definitions.

Suppose in immunology someone calls something a site. A definition is missing. Does site refer to a location where a vaccination took place (e.g., a doctor's office), or to an anatomical location where a vaccination was injected.

Miscommunication can easily result where definitions for terms are absent, unclear, imprecise, incomplete, and/or un-business-like. Defining things accurately is a central skill for designing concepts.

Precision

A precise message is one that uses shared terms from a business vocabulary correctly.

Sometimes the choice of word for some concept in a message is simply wrong. Such usage can be highly misleading. For example:

Using extension to mean an offering of a product given to a prospect when the prospect clicks on an ad, rather than the concept model's meaning, an additional period of time given to a prospect to accept an offer. (Yes, that's a real example from a large organization arising from social-media marketing vs. traditional marketing.)

Perhaps even worse is being inconsistent in usage — e.g., sometimes a word means one thing, and sometimes another. Such cases are called homonyms (one word or word phrase, but multiple meanings).

Other times a word can span a broad gray band of meaning. For example:

Using customer to mean anything from active customer to any party that has ever expressed even the slightest interest in the company's products or services.

Terms (including synonyms) should always refer to only a single concept in a given context. For that you need a solid business vocabulary, which in turn requires a robust concept model (business ontology).

Reliability

A reliable message is one that complies with all relevant business rules.

Much confusion arises over business rules. Professionals who work with data/system architectures often have a technical view of them. That's off-target. Business rules are not data rules or system rules. A true business rule is a criterion for running the business. Business rules are about business knowledge and business activity, not data — at least not directly.

I recently read the following statement about data quality: "Business rules capture accurate data content values." No. Business rules are about running the business correctly.

If the business is run correctly, its business communications should be formed correctly. If its business communications are formed correctly, then the content of its data/system architecture should be correct. So yes, business rules result in correct data, but more importantly correct data arises because business activity is conducted correctly in the first place.

In other words, data quality isn't really about the quality of your data, it's more about the quality of your business rules.

Here are some relatively simple examples to illustrate true business rules. Each example is first expressed by a clear textual business statement[2], then as a corresponding data constraint. The alternative expressions illustrate the fundamental difference between communicating in business terms vs. communicating in data-speak. And remember, many or most business rules are much more complicated than these examples.

  1. Business rule: A customer must have an assigned agent if the customer has placed an order.

Expressed as a corresponding data constraint: A valid agent id is required in the assigned-agent field of a customer record if any order records are listed for that customer record.

  1. Business rule: The payee of a claim payment for a claim must be a party who made the claim.

Expressed as a corresponding data constraint: The payee number, if any, listed in the payee field of a claim-payment record must be for one of the parties listed as having made the claim.

Unfortunately, trivial examples are almost always used to illustrate problems with data quality arising from failure to comply with business rules. Here are samples:

  • Data in a field is invalid because it violates some definitional business rule(s) — for example, social security numbers are found in a field for a person's surname.

  • Data in a field is invalid because it violates some minimum or maximum threshold — for example, a number greater than 99 is found in a percentile field.

Obviously, you do need rules like these, but don't be fooled! They barely scratch the surface. They just happen to be easy to talk about because they involve values of only a single field. Sad to say, most discussions of data quality have been complicit in a vast oversimplification.

Bottom Line

The four dimensions of semantic quality get to root causes of 'data quality' problems, as well as of miscommunication in written or other business communications. Communicating about difficult subject matter is hard to begin with. Blind communication to people you can't converse or interact with directly is the hardest of all. It requires order-of-magnitude sophistication in the techniques used to form the messages.

References

[1] From "Six Myths about Data Quality," by Steven Sarsfield, January 28, 2017.
https://www.ewsolutions.com/six-myths-data-quality/

[2] Using RuleSpeak®, free on www.RuleSpeak.com

# # #

Standard citation for this article:


citations icon
Ronald G. Ross, "The Four Dimensions of Semantic Quality: A Business View of Data and Data Quality" Business Rules Journal, Vol. 25, No. 4, (Apr. 2024)
URL: http://www.brcommunity.com/a2024/c140.html

About our Contributor:


Ronald  G. Ross
Ronald G. Ross Co-Founder & Principal, Business Rule Solutions, LLC , Executive Editor, Business Rules Journal and Co-Chair, Building Business Capability (BBC)

Ronald G. Ross is Principal and Co-Founder of Business Rule Solutions, LLC, where he actively develops and applies the BRS Methodology including RuleSpeak®, DecisionSpeak and TableSpeak.

Ron is recognized internationally as the "father of business rules." He is the author of ten professional books including the groundbreaking first book on business rules The Business Rule Book in 1994. His newest are:


Ron serves as Executive Editor of BRCommunity.com and its flagship publication, Business Rules Journal. He is a sought-after speaker at conferences world-wide. More than 50,000 people have heard him speak; many more have attended his seminars and read his books.

Ron has served as Chair of the annual International Business Rules & Decisions Forum conference since 1997, now part of the Building Business Capability (BBC) conference where he serves as Co-Chair. He was a charter member of the Business Rules Group (BRG) in the 1980s, and an editor of its Business Motivation Model (BMM) standard and the Business Rules Manifesto. He is active in OMG standards development, with core involvement in SBVR.

Ron holds a BA from Rice University and an MS in information science from Illinois Institute of Technology. Find Ron's blog on http://www.brsolutions.com/category/blog/. For more information about Ron visit www.RonRoss.info. Tweets: @Ronald_G_Ross

Read All Articles by Ronald G. Ross

Online Interactive Training Series

In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.