Predictive analytics is a discipline with immense power, which can add enormous value to most organisations and is an area of increasing interest to enterprises.
While the theories behind analytics are taught at universities and institutions around the world, courses often neglect to cover practical application. Typically, when issued with an assignment, students are given a narrow scope with a limited sample of data. There are only so many ways you can interrogate the sample. However, in the real world, with the exponential increase in data volumes and multiple data sources, there are infinite ways in which the data can be manipulated and analysed. The result is that the practitioner’s mind is inundated with new questions on ways to extract insight.
In our experience, there are key practical measures to help the practitioner find and analyse data in a way that delivers business value.
Begin with a goal in mind
Don’t try to find the answer to a question which doesn’t exist or has no use to the business. There is no point in going down the rabbit hole and investigating the data to the nth degree if there is no burning business issue to solve. Rather spend the time upfront to identify where you should focus your efforts. Understanding what data is available and the quality of the data will be critical in guiding these discussions.
Never assume data is perfect
The data underlying the project is what determines how powerful or precise a predictive model can be. People often make the mistake of waiting for perfect data – it rarely exists in practice. The effort and time required to get a consolidated view of the data is not always highest on the list of priorities for IT.
That doesn’t mean that the data shouldn’t be as good as it can be. There are a number of ways to deal with incomplete or messy data. Methodologies exist that allow you to work around the problems and provide ways of dealing with incomplete data.
Garbage in, garbage out
An extension of the problem above is that it can happen that these data problems can’t be fixed with statistical methodologies. Quality assurance of your data is extremely important, specifically when it is either manually captured into a system or gathered from outside sources (i.e. interview data). Bad data quality will lead to bad conclusions.
Failing to use enough variables, either from the source or hand-crafted
Don’t be afraid to throw the kitchen sink at your model, especially in the initial pass. The predictive modelling technique can remove excessive variables, so look at anything you have available that you think could possibly be predictive and makes business sense to include.
The flip side is, more variables, more complicated models, which is not always the objective; in certain industries, models that can be easily explained are far more attractive, and consequently quicker and easier to implement.
As a result of our understanding of the business objective and past experience, we often find that the variables we create ourselves are more predictive than the raw data available. One important mistake to avoid is using multiple fields that have a high correlation.
Always subject matter know-how
It is a common misconception that predictive modelling simply entails taking data and plugging it into an analytical software solution – and accurate predictive models will come out the other side. People with this mentality will find that more often than not that their model is flawed.
It is important to talk to the experts who know the business, and have an in-depth understanding of the product or project you are working on. This can highlight problems in the data which would not otherwise have been spotted and will give you a better insight into the end goal, lead to better predictive variables and ultimately, better models.
Always define the model’s objectives in a business context
Another restrictive practice found in universities when teaching these methods is that people are often taught just to focus on the accuracy of the models. The fact of the matter is that the model has to fit business constraints and the objective needs to fulfil a real life business need. The best models are of no use without a proper deployment plan. It goes back to the business understanding phase of the predictive analytics journey.
Theunis Jansen van Rensburg, Analyst at BITanium