What Is Data Science, Really?                            

Ronald G.  Ross
Ronald G. Ross Co-Founder & Principal, Business Rule Solutions, LLC , Executive Editor, Business Rules Journal and Co-Chair, Building Business Capability (BBC) Read Author Bio       || Read All Articles by Ronald G. Ross

Not too long ago I attended a conference on data analytics and machine learning. I listened to one innovative and exciting session after another. The term 'data science' was sprinkled generously throughout. But were they all talking about the same thing? Or was it simply a code word offering membership in an imagined community?

Indeed, 'data science' could simply be a term of convenience for a broad and enticing new marketing space. The industry loves that sort of thing. And broad indeed! Consider the following statement from Wikipedia (and pardon a nit about 'business' coming last):

"Data science … incorporates skills from computer science, statistics, information science, mathematics, information visualization, data integration, graphic design, complex systems, communication and business."

I asked several times at the conference for a definition (which, I admit, is a habit of mine — good or bad, depending on your point of view). I was consistently disappointed. Perhaps 'data science' is a discipline that doesn't admit semantics(!?). That's a very interesting question, but I'll not digress.

The best response I got was something like, "It's more than statistics. More than business analytics. More than machine learning." To which was added, "You can't get an MBA to become a data scientist. Or get a degree in math. Or computer science."

Not an adequate definition at all! Not even a definition(!). Shared understanding does not arise from saying what something is not, or explaining how you can't become one. Shared understanding arises from expressing what something is. And no, you're not allowed to say, "I know it when I see it." I might look at the very same thing and not see what you see.

Wikipedia does give lots of good insights but, in my view, falls short of a solid definition. It says:

"Data science … uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured."

Some observations:

  • It seems rather obvious that something called a 'science' would have to use 'scientific methods'. What's the real point there?

  • The phrase "uses processes, algorithms and systems to extract knowledge and insights from data in various forms" could also be applied to 'data analytics' or 'statistical analysis'. What's the differentiation? It could also probably be applied to 'learning' (as in machine learning). What characteristic specifically makes data science a new, different thing of its own that can be clearly differentiated from similar and/or pre-existing things?

Wikipedia also says:

"The field encompasses preparing data for analysis, formulating data science problems, analyzing data, developing data-driven solutions, and presenting findings to inform high-level decisions in a broad range of application domains."

It's certainly true that we are seeing all kinds of new 'formulations' these days to develop 'data-driven solutions'. But that's sort of cherry-picking. What's the essence of the concept?

On social media, William Brooks suggested this definition:

data science: the application of the scientific method and experimental design to the statistical analysis of data

Much better! He went on to say:

  • "This definition differentiates data science from most of the data analysis that goes on in business today." (Yes, true, but correctly not part of the definition.)

  • "Much in the same way that an Erlenmeyer flask might be used for scientific inquiry — or as a convenient vessel for a beer — machine learning uses tools of statistical analysis that may be used in data science or in other ways."

That second point provides an excellent insight. A good 'essence' definition might therefore be:

data science: the application of the scientific method in using the tools of data analysis

That leaves one additional question. Can you really have a science of data? Has the world now become so digital that you can have a science of data when the data itself can literally be about anything?

I suppose the answer depends on your definition of science. I hate to say it, but definitions provided by standard dictionaries support you either way. So, in the end (as always), the meaning is whatever the community says it is. I just wish the community would say it more clearly.

# # #

Standard citation for this article:

citations icon
Ronald G. Ross, "What Is Data Science, Really?                            " Business Rules Journal, Vol. 22, No. 8, (Aug. 2021)
URL: http://www.brcommunity.com/a2021/c073.html

About our Contributor:

Ronald  G. Ross
Ronald G. Ross Co-Founder & Principal, Business Rule Solutions, LLC , Executive Editor, Business Rules Journal and Co-Chair, Building Business Capability (BBC)

Ronald G. Ross is Principal and Co-Founder of Business Rule Solutions, LLC, where he actively develops and applies the BRS Methodology including RuleSpeak®, DecisionSpeak and TableSpeak.

Ron is recognized internationally as the "father of business rules." He is the author of ten professional books including the groundbreaking first book on business rules The Business Rule Book in 1994. His newest are:

Ron serves as Executive Editor of BRCommunity.com and its flagship publication, Business Rules Journal. He is a sought-after speaker at conferences world-wide. More than 50,000 people have heard him speak; many more have attended his seminars and read his books.

Ron has served as Chair of the annual International Business Rules & Decisions Forum conference since 1997, now part of the Building Business Capability (BBC) conference where he serves as Co-Chair. He was a charter member of the Business Rules Group (BRG) in the 1980s, and an editor of its Business Motivation Model (BMM) standard and the Business Rules Manifesto. He is active in OMG standards development, with core involvement in SBVR.

Ron holds a BA from Rice University and an MS in information science from Illinois Institute of Technology. Find Ron's blog on http://www.brsolutions.com/category/blog/. For more information about Ron visit www.RonRoss.info. Tweets: @Ronald_G_Ross

Read All Articles by Ronald G. Ross
Subscribe to the eBRJ Newsletter
In The Spotlight
 Ronald G. Ross
 Silvie  Spreeuwenberg

Online Interactive Training Series

In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.