You're Relying on Data Too Much

Data can often make decisions worse, not better. This blog post gives an example of one such situation as a metaphor.

When you’re running a business, your job can be boiled down to making a sequence of important decisions: should new products be launched? Should a department be disbanded? Should a marketing campaign be run? Every decision has risk and reward ' will you make piles of cash or burn through them? It’s easy to believe that there is one lone right choice among a universe of disaster, and the job of the executives is to fine that one choice.

Every top selling management magazine will tell you that the gleaming savior of the 21st century is data. Sufficient data will keep you from making any bad choices. Store data on your customers, your product, and your marketing. Then hire some smart data scientists, and you soon will be freed from mistakes. Google “data-driven decision making” and you will find hundreds of management articles about how you aren’t collecting enough data, monitoring enough KPIs, or having enough data scientists running enough analyses.

Scientific debt, a term coined by David Robinson, urges businesses to decide which analyses are important to focus on immediately and which areas of intrigue can be postponed until later. Here, your debt is knowledge. This concept allows for companies to discuss their data science priorities up front. Every company should be actively considering their scientific debt.

In my years working with many businesses, I have indeed seen some companies that fell into the situation of not using data enough. However, these occurrences paled in comparison to the number of times I have seen the the reverse issue: companies with an over-reliance on data to the point that it was detrimental. The idea that data is needed to make a good decision is a destructive one.

The idea that more data and analyses makes for better decisions is fundamentally untrue.

“Is this AI?” meme

Consider a recent story of how Coca Cola used AI to create new soda flavors. After pressing through the PR, it’s clear Coke used their freestyle flavor machine data to see people were often selecting Sprite with the Cherry add-in. With this knowledge Coke launched Cherry Sprite as its own soda line in stores. Ignoring the fact that “we aggregated data and Cherry Sprite was popular so we launched it” is the most egregious example of claiming something as AI ever, this is also also a great example of how data can hurt decision-making.

Consider that while Cherry Sprite is a new flavor that has never existed before, the extremely similar flavor *Cherry 7UP *had existed for thirty years! “AI,” combined with data from millions of customers, created a flavor that could have been thought up by a Coke executive walking down the grocery store aisle and thinking “we should copy that.” Unlike investing millions in data infrastructure, stealing ideas is free.

If a Coke executive could have made this thirst quencher happen any time over the past thirty years, why didn’t Cherry Sprite hit the market until big data was involved? From what I’ve seen, these situations happen when people implicitly know an idea is good, but they can’t get others on board. Over the years there have probably been hundreds of Coca Cola executives who thought Cherry Sprite was a good idea, but at no point did they have the critical mass needed to launch the product. Only once someone got freestyle data was that person able to convince the company that the flavor was sound.

Cherry 7up can This cold boy was quenching thirsts before AI was even trendy.

That isn’t a sign that data is important to make a decision, it’s a sign that if you only make decisions with data then you limit your opportunities. How much more money would Coke have made if they launched Cherry Sprite a decade earlier? How many soda flavors are out there that people would love, but won’t ever be launched because they don’t exist in a freestyle machine? By limiting yourself to only decisions where you have sufficient data, you dramatically lower the set of acceptable solutions for your company. Often the most innovation changes that companies could make are ones where there isn’t any possible data beforehand.

But here’s the real kicker: even if you had good data your analysis always includes assumptions. If those assumptions are wrong your analysis can give a bad result. These bad results can lead the organization to do something even worse than if they didn’t use data. If a company is trying to launch a new product, they may analyze similar existing products to predict the return on investment. But the assumption of which products are analogous can dramatically alter the analysis, and there is no way of knowing which ones will behave similarly. It is possible the new product would behave unlike any existing one at all, and by doing the analysis you misdirect the company on what would happen. As data scientists we should hope that on average we are directionally right, but it’s impossible to know. Data cannot replace human intuition and it cannot remove risk.

When data is used in conjunction with other factors to make a decision, flawed analyses aren’t a problem. The other factors compensate and the organization will make strong decisions. When data is necessary for decisions, flawed analyses are a huge problem. Executives let bad assumptions pass because they’re the only way for their ideas to launch. This creates really negative corporate cultures around the positive-sounding idea “we back all of our decisions with data.”

The takeaway is not that data is bad and you shouldn’t want to use it. Rather, data and analyses should be used in care with other decision making tools like market research, previous experience, and proof-of-concept trials. By treating analyses as something that only makes decisions better, you run the risk of creating environments where you are limited in what is possible. Don’t treat data as the sole source of truth, otherwise you’ll be waiting decades to try another new soda flavor.

While your data science projects won’t always work, your data science infrastructure should! I’m the Chief Product Officer here at Saturn Cloud–a data science platform that’s great for the analyses and model training you have to do as part of new data science projects. If you’re looking for a way to spend less time setting up Linux libraries and hardware and more time iterating through ideas check us out.

About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.