When the land is beset by a terrible curse, Sovay sets off to hunt the mysterious ways lost to her people. She forsakes the comfort of her local homesteads and heads toward the deep wood.

OTT & the Curse of Big Data

The Curse of Big Data was first detected in the early 2000s.  If you collect billions of rows of data, you can’t determine if what you’re seeing in the data is a result of bias:  You can find a million rows (and ways) to confirm any bizarre hunch you might have about your business.  You can easily find “correlations” that are either coincidence or just bad data.  Remember, it’s safe to assume you’re the first person that has ever looked at this particular set of data and what it is telling you might be wrong.

It’s not possible for an analytics team to review data effectively when it comes in such large volumes, so they must turn to technology like Artificial Intelligence to make sense of it.

Take THAT, Science!

When the data gets big, we tend to get driven into two camps. One side of analysis is the engineering-led AI efforts.  This team has put tremendous energy and capital into collecting, parsing, and saving a huge amount of data.  They want to get value out of every row.  It’s a perfectly reasonable stance for your engineers to demand evidence-based decision making.  In fact, that’s at the heart of science.

But slogans like “Data or It Didn’t Happen”

expose a key shortcoming: If you have data about it, you already know the answer.  You can describe things that happened to you in the past and make predictions about how the same things may happen to you again. 

You are reduced to asking “Is there a cat in this picture?”

If you think data is important to analyze, then how about a crazy speculation or rather “hypothesis”.  This interplay of what we know and what we can barely imagine is canonized in Cognitive Psychology as the “explore/exploit dilemma”.  Knowing that the worst decision is no decision, we are forced to answer questions like “Will there be an earthquake tomorrow?”

So where are the rigorous theoretical frameworks that can bridge the gap between the Python and Excel camps?  In fact, these frameworks have been in development over the last 70 years of research in AI, yet because our analysts and engineers use either excel or python to come to their answers – they have missed the opportunity to leverage this work . This research has supported multiple Nobel Prizes in Physics, Economics and even Literature.  More importantly, these mathematical frameworks are available to solve long term business problems.

The Land Beyond Excel and Python

For streaming companies, opening a new market or pricing a new product means you just have to guess about some things. Excel is an excellent tool for this early stage ideation.  It contains familiar tools like averages and linear regressions and with these, you can envision and make several predictions about the future.  

Once the data starts coming in, you’ll be eager to see if your hunches are paying off.  With OTT, this is likely to be a bit overwhelming. Not all data collected will be useful, and a lot of useful data will not be collected.  This has the unfortunate effect of eliminating many families of AI for analysis, including some of the modern and popular techniques using Neural Nets and Deep Learning.

Over the next few posts, Sovay will continue her journey into exploring the land beyond python and excel, where she will discover examples of how frameworks from psychology, physics, econometrics, and mathematics can leverage both domain knowledge and data to create defensible, interpretable, and scalable analysis.