Predictive Analytics: Considering Neural Networks

May 18, 2012 | Chris Baird, Board Member, NCCD

Over the past 20 years, significant advancements have been made in both the presentation of “predictive research results” and the field’s understanding of how these tools can be effectively used to improve decision making in child welfare work.

Over the past 20 years, significant advancements have been made in both the presentation of “predictive research results” and the field’s understanding of how these tools can be effectively used to improve decision making in child welfare work. The National Research Council and one of the country’s premier statisticians, Steven Banks, essentially shifted the debate from that of “predictive accuracy” to probability and introduced the concept of “classification potency.” This concept asserts that the power of a classification system should be judged by distance attained between the outcome rates for each classification category and the base rate for the entire sample AND the relative proportion of cases in each classification category. This direction should be maintained to ensure that analytics are used effectively in human service decision making.

When child welfare directors hear that outcomes can be predicted with 90% accuracy, they are quite naturally impressed and interested. However, the results of what has been accomplished to date with neural networks needs to be put in the proper context before it is touted as a better approach to prediction. Much of the literature on neural networking indicates it may not offer any benefit over other analytical approaches used to develop risk assessment instruments.

For example, a study using data from the third National Incidence Study (NIS III) attempted to predict which cases would meet the “harm standard” when a report of child maltreatment was received. Results from the validation study indicated that predictions could be made with 90% accuracy. Further, the rate of “false positives” was reported as 0.6%. However, when all of this is put in context, it shows that little, if anything, of value was produced for the following reasons:

  1. Establishing that harm occurred is not really a place where prediction can play a role. By the time enough data are collected to populate a predictive model, evidence of harm should be apparent.
  2. The harm standard was met in less than 8% of the cases in the entire sample (the rate of harm for the validation sample was not reported, but since this sub-sample was randomly selected, it can be assumed to be close to 8%). Knowing this, we could “predict” that no case would meet the harm standard and be right 92% of the time.
  3. When the rate of the behavior being predicted is low, analysis of false positives and false negatives can be quite misleading.
  4. We know only that four “false positives” were produced. This is presented as a false positive rate of 0.6%, a rate that was calculated using the number of cases in the validation sample as the denominator. The denominator should be the number of cases predicted to be positives. If only four cases were predicted to meet the harm standard, the rate of false positives was 100%.

Turning attention to the juvenile justice study conducted on cases from Philadelphia brings up a separate issue. In this study, the outcome is a future event (arrest), the base rate is nearly 30%, and the number of cases predicted to be re-arrested is clearly identified, all of which are major improvements over the NIS III study. However, the reported results are well beyond anything produced in other studies—basically that a two-group classification correctly predicted outcomes for 97% of the sample. If true, this is not simply an improvement; it is a quantum leap forward. What is unclear is how such results are possible. The explanation given is that neural network models constantly learn and improve. The questions are: How do NN models learn? Is the source of this “knowledge” additional data, additional cases entering the system? If so, are these models of real use for “gateway” decisions or are they better suited for reassessment and identifying “pathways” that change the risk of failure? In this context, they may be quite useful.

While the results are intriguing, there is good reason for skepticism. I know of no field where such results have been produced, let alone human services where data are notoriously unreliable and results need to be predicted well into the future.

There is real danger at the policy level here as well. If decision makers think future behavior can be predicted as this level of accuracy, they will be under tremendous pressure to make important decisions (i.e. who to incarcerate, who to place in foster care) based on these predictions. This suggests the need for a very serious review of NN accuracy.

This is not at all meant to deter continued exploration, but much more is needed before anyone should conclude that neural networks produce better results.