Case Example: Using Predictive Analytics to Answer a Child Protection Question

September 10, 2015 | Colleen Kerwin, Researcher

Colleen Kerwin

Predictive analytics is fast and powerful. It has the ability to comb through large amounts of data looking for the most important relationships and factors. At NCCD, we leverage the strength of predictive analytics by using it as a first step to answer our questions and explore the data.  

Predictive analytics is fast and powerful. It has the ability to comb through large amounts of data looking for the most important relationships and factors. At NCCD, we leverage the strength of predictive analytics by using it as a first step to answer our questions and explore the data.  

Recently, an agency asked us to determine if a child protection screening assessment used to decide which families should be investigated was working properly. One of the ways we went about answering this questions was to examine the difference in a 30-day subsequent investigation between two groups—families who were investigated and families who were not.  

We found that families who were not investigated had a recurrence rate four times higher than the families who were investigated. This led to the question, is the screening assessment missing a group of families who should be investigated?  

To explore this question we used predictive analytics: specifically, recursive partitioning. Recursive partitioning is a type of tree analysis (the output is a tree, or rather an upside-down tree) that uses regression to predict outcomes for individuals based on their characteristics. To arrive at the end of a branch (a prediction of the outcome) you may have to pass through multiple splits, which are characteristics or factors with a relationship to the outcome. Recursive partitioning provides an idea of how different characteristics and factors work together to increase or decrease the likelihood of experiencing an outcome. 

In our analysis we used the 30-day subsequent investigation as our outcome. We used R, a statistical software, and all the information we had collected from the agency on the two groups of families, including all items captured on the assessment, the worker’s screening decision, the family’s prior history, the make up the household, etc. and ran the recursive partitioning model.  

Our first split, which is the factor that has the largest relationship to the outcome for the entire data set, was the worker’s screening decision: should the family be investigated or not? This step validated our previous finding of the large difference in recurrence rates between the two groups. For the families who went down the “no investigation” branch, the other two splits of the model were having an unrelated adult in the household and the number of males in the household. These two household/family characteristics did not show up on the “investigated” branch of our model. This told us that when considering families who were not investigated, the characteristics with the strongest relationship to our outcome (whether the family was re-investigated within 30 days) had to do with the household/family.  

The predictive analytics modeling we used to answer this question provided us with a great starting point to further explore in the data, with input from the agency. It also reaffirmed the importance of good child protection practice. A hotline worker taking a phone call about suspected child abuse or neglect should be asking the caller questions about the household: how many children live there, how many adults live there, are there unrelated adults living there, etc. Our findings show that this information is important to know both about families who are investigated and about those who are not. Predictive analytics helped show us that if we want to make sure we are investigating all families that should be investigated, we must also pay attention to information about the households of families who would normally be not investigated.