A solution for the data visibility challenge in  in modern data systems 

Mary Loubele, MYYL, PhD

Version version published online on 2/1/2024
Peer reviewed version version published online on 2/9/2024

Feedback is highly valued and can be send to mary@myyl.tech

TLDR;

This is the first blog post in a series that describes modern data systems, the data producers and the business data and regulatory data consumers. We describe how all of these parties are interacting with the data and explain the data visibility challenges that currently exist. We will propose a smart system that will have a different type of consent management through a visibility matrix, to facilitate the collaboration between all the parties involved



We are all data producers!

In the modern world, we as users of various online data systems are all data producers in the data economy. This data is being consumed by various consumers: the people we are connected with in our network, business data consumers and regulatory data consumers. The most known business data consumers are the social media companies and the various search engine companies. To ensure that we are able to interact in a safe and fun environment, the various governments ensure that there are regulations about legal, privacy, security, government and finance for online data systems. The regulatory data consumers make sure that these regulations are followed.


How data producers are interacting with their data systems

When we as a data producer are interacting with our data systems, we are interacting mostly with websites or mobile apps which are called the front-ends. Depending on the application we are either trying to have fun or getting our work done. We post content, we view content or are working on our documents. This basically looks like the picture below.

Basically each time we sign up for a service, we are reviewing a lot of rules and are saying that we are going to behave when we are using the system and next we can have as much fun or get as much work done as we want when we are using the data systems as long as we stick to the rules of the system. It is also important to note here that when you have access to the systems you are free to do with the data as you wish as long as you keep obeying the various rules. So this means that you are allowed to combine information in your head about some of your connections that you might have both on Linkedin and on Facebook.

How data flows from the data producers to the various business data consumers

The front-ends with which we are interacting as data producers are only the tip of the iceberg of the data systems. On one side we have the business consumers who make it possible to have an application that we as data producers want to use. Think in this case about the various types of software developers, designers, data scientists, data engineers, product managers and executives. From the other side we have the regulatory data consumers which are called the Trust department at various companies. Trust consists of privacy and security. The role of privacy is that the business consumers only get limited access to the data from the producers. The role of the Security data consumers from the other side is to just get enough data from the data producers so that when an account is hacked they can correctly identify that the correct data producer is asking for access and get access back as fast as possible. This is really important because otherwise all the data producers lose trust in the system and might abandon the system altogether. 

To understand how the various data consumers are working together, at data companies, we provide an example data architecture below.

The main goal of this data architecture is that each data consumer, either business or regulatory  has the data available that they need to get their job down and make correct decisions, but also don’t have more data needed than that they need to get their job done. Therefore the data is either available as queries through tables, through APIs or visualized in reports. Because each data consumer, business or regulatory will need at some moment access to data for debugging and monitoring purposes, we will focus in our future blog posts mostly on queries. 



What is the challenge for business data consumers and regulatory data consumers?

As opposed to the data producer who only needs to review a data contract when they buy or sign up for an app, the business data consumer needs to review and sign a contract for each piece of data that they need and for each new situation that they end up in. This evaluation of contracts takes a lot of time and is error prone and is also very ambiguous. On top of this the burden of the decision making is put on the data consumer. Because of this burden there is a lot of stress and anxiety for the business data consumers. It is also important to note here that mistakes that happen in this decision making process can have a huge financial impact on the business and the reputation of the business.

Proposal for a smart system that has a central visibility matrix

To facilitate the decision making process and to  reduce the stress on the data consumers, we propose to build a smart system that has all the background related to the various types of data source, roles and data governance. So that each time the querier needs to perform a new use case, the smart system should be aware about the access that the data consumer should have.


The central part of this system will be a visibility matrix that will provide the correct access level for data for the various consumers. The granularity level of the system will be a combination of data consumer, data producer and metrics. The correct decisions will be made by combining this granularity with extra inputs like the role of the data consumer combined with the governance rules known at that moment of time. 

The responsibility for correctness will lay with the smart system. When the querier, the data consumer is not getting complete access a clear message will be provided. When the querier doesn’t get the access they were expecting to get, there will be a combined approach to resolve the issue. When no data is provided to the data consumer and the consumer has a reasonable understanding that they should have access this should be filed as a severe bug so that in a reasonable amount of time the consumer either can get access to the data or there should be a clear reason provided why the consumer can’t get their job done in this way. 

In case the consumer only gets limited access to the data, which will be most of the cases, it is also important to know that the user has complete understanding of the amount of records that are missing. This is really important to be able to provide correct reporting and provide a great experience for the data producers.

In future blogs

We have provided definitions for the various actors in the current data economy and have explained the challenges for getting access. We will explain in one of the next blogs how we can model the smart system using a visibility matrix. Next we will also work on two comparative studies and provide some guidance related to best practices related to Better Data Engineering. For the two comparative studies we will first evaluate how these concepts of data visibility are modeled in the banking world and in the medical world. In the second comparative study we will start modeling the various types of business models in the current data world. This last modeling exercise will provide us with a clear understanding of all the data privacy challenges that exist. In case you are looking for other topics, feel free to email these to the email address above. In case you are interested in submitting a blog post yourself, feel free to start writing but hold off still with submitting. For now we are first figuring out our peer review process. Once we have figured this out, we will provide a way to submit your blog posts.