How to Improve Data Related Processes

Summary: A lot of people think that just because they are not officially members of a data team (or some team specifically charged with cleaning and analyzing data), it is not their place to get involved with data improvements. That could not be further from the truth! Data is involved in many different processes around an organization. In this article, Vanessa examines the ways you could be involved in a data related process without even knowing it.

Vanessa Lam Director, BRSdata Read Author Bio || Read All Articles by Vanessa Lam

A lot of people think that just because they are not officially members of a data team (or some team specifically charged with cleaning and analyzing data), it is not their place to get involved with data improvements. That could not be further from the truth! Data is involved in many different processes around an organization, and you could be involved in a data related process without even knowing it.

Is this article for me?

Yes! Everyone can help improve data related processes. But why does the data team need your help fixing it? Data teams usually don't have influence over these data related processes. They are business processes that produce, store, or handle data in some way. Data may not be the main focus of the process, meaning that some people may ignore or neglect the data step; however, the data produced by this process is essential for further analysis. Data teams are then asked to do the impossible — fix data that has already been input into the system incorrectly. With the help of people who work in or can influence these data related processes, data can be of higher quality and data teams can spend less time cleaning data and more time bringing valuable insights to the organization.

What are Data Related Processes?

Data Related Processes are processes that somehow interact with data. These could be data projects in their entirety, such as a pipelining process or an analytics process. However, they can also be processes that involve creating data, storing data, or processing data in some way that the data team does not own. Data related work streams can be owned by people across the organization, no matter their level of data experience.

If this sounds like most processes, it is! If you look hard enough, most processes in organizations these days interact with data in some way. Because of this, I will refer to these "data related processes" as "processes" for the remainder of this article.

An example of a process that may not involve any data people (but still involves data) is inputting finalized sales data. The process may look something like this:

Salesperson closes the sale

Contract with final sale numbers is submitted to the legal team

Salesperson submits final sale numbers on Salesforce

A data person is not involved in this process at all; however, the process is very much related to data. The final sale numbers in Salesforce will feed many data processes, potentially affecting analyses and even incentives in the future.

Data people often are charged with ensuring data is clean on the back-end, but if the data has been input incorrectly in step 3 of this process, it is almost impossible to return correct analyses. In these cases, data people often do not have the reach or influence to make these changes directly and need businesspeople to help them make these changes.

What Processes Need Improving?

How do you know what processes need improving? If you're not constantly using data, then how will you find these processes? Here are some suggestions to get you started:

See which data doesn't align with your understanding of the business

You probably have a sense of what is going on in your department, business-wise. Even without the data or numbers, you know if sales are doing really well this quarter or if manufacturing issues have caused production to slow. So, if the data is telling you something completely different, it might be time to poke around and find out why! Trace the questionable number back to its source (potentially with the help of the data team) to see if there was an input issue. This is not to say that all surprising findings are wrong, but sometimes it may be worth double checking to see if someone input $50,000 instead of $500,000.

Keep your ear to the ground of data input complaints

Sometimes data input is just hard. It can seem so unimportant in the grand scheme of things, especially if the data doesn't directly tie to the data inputter's role. Like the example above, once the salesperson closes the deal and sends the signed contract off to legal, the job feels done. Unfortunately, that means the final step of inputting the data is often forgotten.

Look out for "offline datasets"

Do you know of any datasets or analyses that are not being done within the regular data pipeline? These are what we call "offline datasets." They are usually created for good reasons — wanting to collect data that are not currently being collected, doing an analysis quickly so that the data team doesn't have to handle it, creating small automations that help you do your job more easily. However, these datasets can grow into a data ecosystem of their own, competing with — and sometimes in conflict with— the official channels. This can cause many different calculations of the same metric, confusion about the official source of truth, and inefficiencies when analyzing data.

Watch how others use data assets

Especially if you are a data power user or are very familiar with certain data assets (e.g., dashboards, analyses, data sets), keep an eye on how your teammates are using data assets. Data assets can often be twisted to calculate all sorts of metrics, even if that's not what they were created for. For example, our Gross Sales should be calculated as the sum of all Salesforce final sale numbers. However, some analysts may not have access to this number, so they take the sum of all costs for the quarter and multiply by 1.4, as the company has an average 40% margin. While the logic makes sense, this is clearly a misuse of the cost dataset as well as an oversight of the Salesforce dataset.

Listen to how people use terms

Have you ever noticed that people use different words when they mean the same thing? For example, some people may call the difference between Revenues and Costs the "Profit" and others may call it the "Margin." Similarly, sometimes people use the same term but mean different things by it. For example, according to the sales team "Revenue" may be the sum of all sales; however, according to the operations team "Revenue" is the sum of all sales subtracting all returns. While these differences seem minor to an observer, they can compound and result in very different data results. These assumptions may be embedded deep into the code, making it difficult to surface these after the code has been written.

Talk to the data team

When things are going wrong in the data, the first people who are contacted are the data team. They often spend a lot of time combing through code or digging through issues trying to figure out if there's a bug, but they are actually dealing with an input issue. Help them sense-check some of the numbers with your business intuition! Know that data people might have numbers in their heads, but that only allows them to compare to other data that they have. If there's a data input systemic issue or one that doesn't raise a red-flag, that would make it very difficult for them to catch.

How do I start improving these processes?

Now that you have found a process that needs improvement, how do you start on making improvements?

Talk to the data team — make sure they are aware of what you want to do. They may have even thought about this and be able to provide guidance and partnership! Oftentimes data teams want to make process changes, but don't have the influence to do so, so they would love to work with you on improving processes to increase data quality and efficiency.

Understand the issue — interview the users of the data process. Why do they input or use the data in this way? What part of the current process is not working for them? Then write up a document stack-ranking all the reasons that the data process is not working. Often 2–3 reasons will bubble to the top, which can then inform a possible process fix.

Re-design the Process — now that you know the issue, partner with the leadership of the department and the data team to come up with a solution. Usually, this solution involves either making the process easier for data entry or creating an incentive to perform this task. Taking our original example, one possible solution is to have all contracts be electronic. Using those electronic contracts and their inputs, automate the input into Salesforce to take the burden off the salespeople. Another possibility is to tie incentives to Salesforce input. For example, their commission will be based on the Salesforce sums, verified by the contracts. However, if the Salesforce numbers are not input in a timely manner, there will be a 1% deduction per week. While these examples may have more nuances in real life, they illustrate some possibilities of process improvements.

Define Terms and Rules — Using a data dictionary or concept model, work with the data team and key stakeholders to define the terms used in your data area. Ensure that these terms are aligned both on the data side and the business side. This will help everyone communicate more effectively. Additionally, ensure that data quality business rules are surfaced and agreed upon by all parties (data team, business team, and key stakeholders). Examples of data quality rules include things like ensuring that all IDs are unique or determining mandatory fields.

Data is not always in the data team's hands! It takes a culture of data awareness and improvement across the organization to ensure high data quality. If you are noticing poor quality data, keep your eyes peeled for ways that you can make a difference.

# # #

Standard citation for this article:

Vanessa Lam, "How to Improve Data Related Processes" Business Rules Journal, Vol. 24, No. 12, (Dec. 2023)
URL: http://www.brcommunity.com/a2023/c133.html

About our Contributor:

Vanessa Lam Director, BRSdata

As Director of BRSdata, Vanessa Lam created a division within Business Rule Solutions (BRS) to focus on data culture, data accountability, and business intelligence.

Vanessa specializes in building data-driven organizations where operations and leadership understand and trust the data, treat data as a resource, and use data to make day-to-day and strategic decisions.

Previously, she worked at Mastercard as a Manager of Business Insight and Productivity and Optoro as a Manager of Business Intelligence. In these roles, she created educational tools for the organization, fostered data communities, and ensured people used data responsibly.

Ms. Lam's popular articles on data quality and data culture are published regularly in the Business Rules Journal and TDAN (The Data Administration Newsletter). She is a frequent speaker at Enterprise Data World, Building Business Capability Conference, Tableau User Group, and Predictive Analytics World.

She has a Masters of Computer and Information Technology from the University of Pennsylvania and a BSc in Economics from the Wharton School. She was recently named a top influencer to watch in 2021 by IIBA. You can follow her at @Big_On_Data on Twitter.

Read All Articles by Vanessa Lam