One of the attributes of really good technology is that it hides the complexity of what goes on in the background, while still being useful. Depending on how old you are, you may remember having to set up a TV by selecting UHF or VHF with a little switch and then slowly turning a dial until a picture appeared out of the fuzz. Then you’d adjust the “bunny ears” aerial, and twiddle the tuner again to see if you could make the image even better.
Modern TVs will do this all automatically for you. You choose an option in setup menu, and every nearby TV station is picked up and assigned to its own channel. If you have never experienced tuning a TV yourself, you will not know that there is a small piece of technology doing this work for you. It happens under the hood. If something goes wrong, all you know is that all the channels are snow.
Reducing complexity is a key selling point for technologies, especially for dull or repetitive tasks. It is a key message when positioning a new product or service that the underlying technology makes the task the user wants performed easy, with no learning curve. Today, this is an expectation we have from our technology: “Make this hard thing I want to do easy! Don’t make me have to understand what’s going on under the hood.”
This is a tendency that vendors love to exploit in developing markets. Most mid and large companies in Africa are already struggling to find specialist technical skills – never mind specialist skills in such a new field as Data Science.
For the most part I’m all for reducing complexity, for bringing hi-tech systems to bear on business problems where expert resources are scarce. But… the growing range of new Big Data technologies that promise to make analytics easy is very dangerous. Big Data is somewhat unique in this way: a small error can be magnified enormously by using the incorrect technique. You have to understand exactly what is going on under the hood if you are to get to where you want to be.
In Big Data, if you don’t know what a useful or insightful outcome should look like, hiding the complexity of what the analytics process is doing is extremely dangerous. Promising a useful output from just any input is even worse.
Right now, Big Data and its more attractive cousin, Data Analytics, is very hot in business circles. Vendors are pushing it hard, promising incredible insights and business triumphs.
The promise from vendors of click-click-result is “data in, insight out”. The reality may be “data in, nonsense out”.
African companies have an incredible opportunity to adopt Big Data technologies, as it is rapidly developing in a way that allows for Cloud-based pay-as-you-go models, rather than in the massive on-site, capex-heavy implementations of the early days of this tech (i.e. a year or two ago). There is a great deal of interest in this tech — I recently presented the technical implementation of the “Enterprise Data Factory” system we built at Internet Solutions to meet this latent demand at the Mammoth BI conference in Cape Town and the BI Summit in Johannesburg.
However, in many of the Big Data systems being marketed, the complexity of what has to be done is hidden from the user. Even more in hosted, Cloud-based systems than in on-site systems.
There is a fundamental problem with this. Unlike the television set, where you either find a channel that’s there, or you don’t, each result of a complex analytics system will be very unique to circumstances of that particular user, and to the analysis techniques they use. There are factors within the underlying complex process that can heavily influence the outcome, and cause the output to be misleading or even plain wrong.
The danger is that the output from a Big Data tool will be accurate to ten decimal places after crunching through terabytes of information on a ten million Rand platform. It will produce an array of dazzling graphs and charts … but it may still be wrong.
Analytics used on a Big Data information source is an incredibly powerful tool – but in the wrong hands, it’s a weapon of mass distraction from common sense and experience.
An analogy I often use is carving a statue out of marble block. The tools are simple: a hammer and a chisel. Carving a statue is conceptually easy to do. You take the tool, and knock off bits of stone. The bigger your hammer, the more stone you can knock of. The reality is that carving statue takes years of practice and skill to sculpt a work of art, rather than end up with a pile of rubble. Good analytics comes from a combination of skills and years of experience to know what a good end result should look like – and how to apply the tools to get it.
Big Data platforms and analytics tools are the hammer and chisel. They are not the artist. To get value from Big Data projects, you need to find and nurture the artists.
There are specific sets of skills in the data analytics artists:
There is statistical knowledge to know the best way to summarise a particular data set. Which approximations to a particular result are valid or not. Whether you can sub-sample your data or not.
There is the programming and computer system knowledge to know if the way you are going to process your data will take ten hours or ten weeks. Whether this particular problem is best solved with Hadoop or with SQL.
To show off the result effectively, you need data visualisation expertise. Sometimes (most of the time really) a pie chart is ineffective. Knowing when you can use them or when they will obscure vital information is a genuine skill.
And finally, it’s the art of telling the story. Even with an obvious result and a well-designed visualisation, the relevance of the insight to the person receiving the information will get lost if there is no story to capture their attention.
These areas of competence rarely existing in one person. This is usually a group of people, each with their own skills and expertise. But the most important part of getting value from Big Data is the hands-on, real-life expert. It’s having technical and analytics experts working with someone with business domain knowledge.
This is particularly important in African markets, which develop differently to the markets in which the main vendors are based. Market structures are different, how data is gathered is different, business and consumer cultures are different.
It is critical to have someone on board in a Big Data project who knows enough about the data that is being analysed, and also enough about the subject being studied, that they can look at part of a result and say “Yes, that seems right”. Or “Yes, it’s right, but it’s not really useful”. Or they can give the most import feedback you need: “That’s definitely not right!”
As the old saying goes, “To err is human, to genuinely stuff up requires a computer.” This is especially true if you don’t understand what is going on deep inside the Big Data black box. We could add an addendum to that old saying: “…and for a business catastrophe you need Big Data”.
Good analytics on large data sets is iterative. You need to experiment, test and validate along to way to make sure you are coming up with a useful insight. This is a very different skill from simply reporting on KPI metrics or similar clearly defined numbers used to help businesses to make crucial strategic decisions.
The temptation to go with a ‘one click, data in, insight out’ solution will always be there. This kind of solution hides the complexity and promises dazzling results. The team of experienced people will work with the same source data as the quick-fix analytics tool. Both give you a result you can use to make decisions with.
But only one works all the way through the underlying complexity to make sure you are getting useful insights that are true, both empirically, and true for your local market.
Powerful Big Data tools are dangerous in untrained hands. We have learned a lot already bringing Big Data techniques to bear on our own business information at IS using the Hadoop platform we built. Believe me when I say, understanding the underlying detail makes all the difference.
Find an expert in data – in statistics, in analytics, in the science of numbers. Someone who knows that the truth is not always intuitive. Find an expert in data processing technology – one good decision here can save you millions in wasted IT spend. Find a visualisation specialist and a good story teller. Put them all together with the person that understands the industry and the market you’re looking at.
Big Data technology works best when it’s combined with Big Human intuition. Find your local artists … and give them powerful tools.
By Jeff Fletcher who is an engineer in the Research and Innovation group at Internet Solutions