The Three W’s of Predictive Analytics

November 13, 2015 | NCCD

Chart Graphic 6-web

Predictive analytics refers to a set of tools that apply mathematical and statistical algorithms to data in order to find patterns that predict the future likelihood of some event happening for a given individual.   

Predictive analytics refers to a set of tools that apply mathematical and statistical algorithms to data in order to find patterns that predict the future likelihood of some event happening for a given individual.   

In most situations, these tools are “trained” using historical data. The historical data are run through an algorithm, which is then told what the “correct” answer is for each individual. The algorithm then identifies patterns in the data that best predict these correct answers. When used in the field with new data, the algorithm looks for these same patterns and returns a likelihood based on the best match.  

While predictive tools can be powerful, because of this training model they require thought and planning in three areas: who, what, and when. 

Who: Predictive analytics tools make their predictions about a specific, defined set of individuals. This can be Netflix subscribers, drivers in rush hour, or children in foster care. The key is to be able to explicitly identify the individuals you are making predictions about. You are not likely to have a very successful model if you train it on one population and then try to apply it to another.  

What: These tools identify the likelihood of a specific event or outcome occurring. They predict that event/outcome and no other. As a result, it is very important that the “correct” answer used to train the model is the same you want to use in the field. Otherwise you can end up with a tool that predicts something other than what you expected. 

When: It is very important to ensure that the data a predictive algorithm is using exist prior to when the algorithm is run and the prediction is being made. Because of the way the training process works, it is possible to build an algorithm using data that won’t be available at the time that algorithm needs to be run. Such models will look great in development and testing, but will fail spectacularly in the field because you are asking the algorithm to make predictions using data that don’t exist yet. 

While this may seem obvious, many predictive tools fail because one or more of these elements was not fully defined or understood. The algorithm can only make the predictions it is trained to make. If we want effective tools, we have to make sure we train them properly.