There is no business without data, and there is no data without business
Now is the time to make the investment
By 2025, it is estimated that 463 exabytes of data will be created each day globally, for a perspective that is the equivalent of over 212 million DVDs per day!
Data could be described as a modern currency and it could be used to drive economic wealth and contribute to solving global issues. It can also be manipulated either intentionally or unintentionally and this can negatively affect the results or conclusions we draw from the data.
If we are creating that much data and it is such a valuable commodity your organisation must be incredibly wealthy, correct? For most sadly not. We are only able to store a small percentage of the data we create. Of that we store, an even smaller amount is immediately usable. From your usable data we need to be able to understand and trust in it.
Being told to make investments in the current climate may not be your first instinct, but in the words of a recent Gartner report, 'data and analytics have proved indispensable during this pandemic, organizations must invest in data and analytics for resilience and business growth' link
That is past. Data is everywhere and organisations understand they need to be more data-driven. Many have already implemented strategies to help them make sense of their data and in turn gain more understanding of their customer’s needs.
Finding your data
While data is everywhere, it is not limited to numbers in spreadsheets. Treating data as knowledge should guide your thinking about what data is, the various guises it can take and where it can be found. Additionally, your knowledge may be greatly enhanced with the addition of data from outside of your business or organisation. For example, the weather report will often influence the decisions we make but you will likely have to rely on the Met Office for that data.
Your organisation may store data in these types of places:
Relational Databases – Typical repository for business applications such as Customer Relationship Management, Human Resources and Finance systems
E-mail and Collaboration Systems (like Slack and Teams)
Log files – did you know, there are twice as many devices connected to the internet than there are people on the planet, each connected device generating data every second
Documents – such as spreadsheets or reports
Images – from pictures of products or services, through to CCTV and facial recognition systems
Working with data
Just finding your data does not mean that you are now wealthy. It may be true that some data can have a market value, just ask Facebook or Google. Most of the real value of the data is not in the data itself but the business information, knowledge, insights, wisdom, and the resulting impact that you can achieve as a direct result of understanding your data. As you can see from the flow of data to impact in this graphic.
In addition, new useful data can be generated by combining or consolidating the data you already have.
You will be aware of the adage ‘garbage in, garbage out’ – just because your data came from a corporate system does not mean the data is correct or useful. Applications are not always well designed, and the quality of data entered can often be at the mercy of the people entering it. Even where a dropdown list is provided to validate data entry, the list may force the person to choose the option which can result in something unwanted or unexpected.
Therefore, we need to find simple ways of transitioning data to business impact.
How can we realise value from our data?
There are many available frameworks and models to realise value from data. Rather than take a complex approach, let us look at four simple steps to realising the value that can be hidden in your data.
Step 1 - Understanding
This step is about business and data understanding. From a business perspective ask, ‘What is the problem you are looking to solve?’ or ‘What is the challenge you are looking to address?’ and ‘What would a successful outcome look like?’
Ideally the answers to these questions should be specific and directly measurable; for example, identify the number of customers purchasing online vs in-store.
If the answers are not specific and directly measurable, subjective decisions will need to be made. It should be clear how the outcome was reached, and by whom because the traceability and understanding of subjective decisions are important to ensure openness and auditability.
The business understanding phase is incredibly important as you could end up putting a great deal of effort into producing the right answers to the wrong questions!
Data understanding is about understanding the data that you will need in support of your business problem or challenge - what data you need, where it is stored and how you can get access to it.
Once you have gained understanding, your data will likely need sorting.
Step 2 - Sorting
At this point you might be asking how you access all the sources of data you have identified.
A central repository can be helpful in this case. Not only does this central data repository give your organisation a means of control and access, but it also means you can give business owners access and simple to use tools to assist with the time-consuming process of sorting.
In the graphic to the right you can see that data sorting (or cleansing) can account for a quarter of the process to get value from your data. link
When combining multiple sources of data you may have duplicates (the same information exists in more than one data source) and so you need to decide which is your primary source. While you may trust an authoritative source, it is important that you do not trust it without question.
In your data set you may have gaps or small numbers of missing values. You might not consider these to be too important but remember that gaps and omissions are often amplified as you work with your data to reach your desired business results. In the context of your business, this may directly impact the reliability of the results you have obtained or the accuracy of your predictions.
Sorting data is important before progressing further. Commonly available tools to correct duplications or missing data items can save you time. Time is always well spent getting your data ducks in a row and the future credibility of your results relies on not skimping on this step.
When gathering data, you may be tempted to grab large data sets that contain information about everything, but this is certainly not an approach advised. Working with large data sets not only slows the process down and increases the risk of including data not relevant to the business problem you are exploring, but also potentially impacts the reliability of your results. Another reason this is not advised is that working with irrelevant data increases the risk of breaching your business or government data policies (such as GDPR).
Equally ensure that you have enough data to be a good representation of the problem you are trying to solve, as this avoids the potential for bias.
Step 3 - Exploring
So far, we have defined the business question we are looking to answer, we have understood what data we have available to us and that for this to be useful it needs to be sorted. When we start exploring our data to look for those valuable insights, there is an important consideration which is the potential for bias in our data.
What is bias and is it limited to the data? Here are a few common examples of areas where bias can affect the results you are looking to achieve. (These areas of bias will also be expanded upon in future articles).
Sample bias is when the data that you select for analysis is selected subjectively and is not a good reflection of your overall data set. This can lead to a skewing of your results.
An example of Sample Bias could be the projecting of next year’s expenses for your organisation and while the data sample you use is solely for the sales department, who are more likely to have higher expense claims than any other department. Therefore, the overall projection of expenses would be higher than if you worked on a smaller sample set taken from every department. Avoid false interpretation and make sure the results are appropriate for your entire organisation.
Sample bias can be unintentional or intentional. An example of intentional sample bias could be in an opinion poll when you deliberately choose data to support your views (see also Confirmation Bias below).
Confirmation bias is when we look for information to confirm your own beliefs. It could also be that you are looking to reinforce your organisation's views where you keep looking in the data until our assumptions can be proven. This can often occur when the person analysing the data has been briefed in advance that they need to support a conclusion or outcome. For many reasons there may be pressure to get a certain answer before even starting the process, rather than just looking to see what the data is saying.
To avoid confirmation bias it is advisable not to set out to prove a predefined conclusion, but rather to test your presumed assumptions in a targeted way.
Confounding Variable Bias (or mixing of effects)
This might sound more complicated than it is. Here is a great example of confounding bias to help explain.
Say you concluded from research data that when more ice creams are sold, more people drown. In this example the confounding variable is the temperature. When the weather is hotter, we will eat more ice cream and we are also more likely to go swimming; and we are likely to do both of these things more than on a cold day. Although there appears to be a correlation between the two pieces of data, these are not cause and effect but could be mistakenly interpreted that way.
To avoid Confounding Variable Bias we need to understand the relationships between our data. It would also be advisable to test your conclusions using a controlled AB test (for details see here link ).
In 1914 Arthur Conan Doyle published The Valley of Fear, in which Sherlock Holmes says “The temptation to form premature theories upon insufficient data is the bane of our profession” which is as true today as it was then. Explore your data consciously and responsibly and remember whether consciously or unconsciously, bias always starts with us.
Step 4 – Insights
We have moved from defining the business problem that we would like to solve to sorting and cleaning our data, then onto understanding the biases that can be there as we explore our data. Now we need to derive the insights that can lead to real impact in our business.
‘A business insight combines data and analysis to find meaning in and increase understanding of a situation, resulting in some competitive advantage for your business. This provides more than low-level understanding of an issue, giving you deeper insight into major mechanics related to your particular business’ ( link )
With all the available data at our disposal, businesses are demanding actionable insights but herein lies the problem. Data itself does not deliver value; value only comes from acting on the insights that the data can bring. And, to turn data into actionable insights, you need to democratise it – break down organisational silos to ensure that the right information reaches the right people throughout the business.
During the present global challenges, one use of artificial intelligence has been to discover new knowledge about the pandemic and potential treatments. A lot of valuable data has been created and shared, which has accelerated insights.
This demonstrates that data has great value, it can only truly be realised when considered thoughtfully, paying special attention to its accuracy, relevance to the hypothesis it is being applied to and ultimately how trustworthy it is.
Understand your data within the context in which it was created, spend time sorting it and be prepared to question the bias that exists within.
From time to time, IBM partners with industry thought leaders to share their opinions and insights on current technology trends. The opinions in this blog are my own and do not necessarily reflect the views of IBM.