Does “Evidence-Based” Mean It Works?

Does “Evidence-Based” Mean It Works?

December 27, 2018 | Elizabeth Harris, PhD

Group of teens

I was excited to learn that under the Family First Prevention Services Act, a greater number of states, tribes, and counties will soon be able to use federal funds to support evidence-based prevention of child maltreatment.

At NCCD and the Children’s Research Center, we are dedicated to researching, evaluating, and supporting practices that are proven to reduce child abuse and prevent family dissolution. Consequently, I was delighted to see a federally funded endorsement of prevention as part of the mandate of child welfare.

However, when the Department of Health and Human Services (HHS) released their standards for determining whether a practice is supported by evidence and research, they used a flawed set of criteria. As per HHS Initial Practice Criteria and First List of Services and Programs Selected for Review as Part of the Title IV-E Prevention Services Clearinghouse, to be considered “evidence based” a study must:

  1. Appear in a peer-reviewed journal or government or foundation publication;
  2. Use quantitative methods, such as a randomized control trial, quasi-experimental design, or a non-experimental design that utilizes an appropriate control group;
  3. Examine the impact of the service or program on one of the prevention target areas; and
  4. Be written in English.

The notice continues that studies “will be rated based on whether they demonstrate at least one meaningful favorable effect (i.e., positive significance).”

This definition of what constitutes proof of effect is likely to both exclude programs that may be beneficial and include programs that do not discernably improve the lives of children and families.

Excluding non-English studies is obviously problematic: HHS may miss important opportunities to fund meaningful social interventions. Social scientific evaluations are time intensive and often expensive to implement. All too many promising interventions are never tested because adequate funds are not available to study them. To exclude studies conducted in languages other than English further narrows the pool of available, rigorous research to no apparent purpose. There is no reason to believe that either peer-reviewed journals or government reports written in English are more reliable or reputable than those in other languages. Moreover, in a system such as child welfare, where many of the service recipients and practitioners speak a language other than English, such privileging of English over other languages is inappropriate.

Secondly, by limiting evidence to quantitative evidence (that which is created with statistical models), HHS ignores important information that may be collected through qualitative means (referring to evidence from observational research, interviews, focus groups, and content analysis).

This second oversight may relate to the fact that HHS appears to be adopting a medical research model. As an example, consider HHS’s goal of funding in-home parenting programs. If tasked with measuring the impact of an in-home parenting program, I would likely want to measure the quantitative relationship between participation in the program and future incidents of child abuse. I would seriously consider a model such as a randomized control trial, even though I know many in social work find randomized controls to be morally uncomfortable. My study, however, would immediately look different from a medical study (which measures the effectiveness of an intervention as compared to a placebo) because there is no such thing as a placebo in-home parenting program.

Unmeasured or unknown differences may exist between the characteristics of parents who complete the program and those who do not; for example, parents who have a car may be more likely to complete a parenting program than those who do not have a car, but the researchers may not have any data about car ownership. This is an example of “selection effect.”

Selection effect simply means there are unknown or unmeasured pre-existing characteristics that differentiate some parts of the group being studied from others. For example, parents who complete parenting programs may also be parents who inherently have lower odds of subsequent child abuse even without the intervention. While I could attempt to gather other information about the parents as control variables, it may be impossible to completely address selection effect. The other hitch in this plan is that substantiated child abuse is, happily, not common. I would need quite a large sample to work with such an outcome. 

Let’s imagine that I conducted this study, and the results showed a statistically significant reduction in child abuse for those participating in the parenting program as compared to the control group. Under the terms defined by HHS, this program would meet the definition of “evidence based.” As a researcher, however, I would not be satisfied with this result alone because of the selection-effect problem described above.

This problem is why NCCD so often uses a mixed-methods approach to research. If I had the findings described above, I would immediately turn to qualitative interviews, observations, and focus groups to create a descriptive portrait of the characteristics of people who complete the parenting education program. Qualitative data collection might even lead me to identify new quantitative variables that I could measure and control for in my quantitative analysis. This would help me interpret how much of the change associated with the program is likely to be related to selection effect and how much is due to the program itself.

In addition to trying to describe selection effect, I would want to know how and why a specific intervention works. While quantitative research is a rigorous way to measure the relationship between variables, as a sociologist, I require a theory before making any claim of cause and effect. Qualitative data are necessary to support the theoretical underpinnings of an evidence-based practice.

To conclude that a parenting program positively affects child abuse, we must collect data about how and why the program changes parenting practices. In introductory sociology and statistics classes, students are famously taught that at times when people eat more ice cream, there is more crime. Of course, there is also more crime when it is warm outside, and eating ice cream is more common in hot weather. Ice cream and crime may appear to have a relationship, but common sense tells us that one does not cause the other. Using qualitative methods to investigate relationships between variables helps us cross the threshold of knowing how and why the variables are related, which in turn can support the development of better and more effective programs.

Moreover, many qualitative outcomes simply do not lend themselves easily or well to quantitative exploration. The fact that something is not easy to measure should not mean that it is not a laudable goal. Indeed, many of HHS’s goals do not lend themselves easily to a non-subjective measure, such as whether families feel supported through their trauma. HHS should not set up a definition of study eligibility and quality that privileges only those outcomes that are easy to measure, even if other outcomes are equally important and worthy of funding.

To address my final concern with HHS’s “evidence-based” standards, imagine for a moment that a researcher conducted an enormous study of the parenting program on a sample of 100,000 parent/children pairs. A study of this size would have very strong statistical power, and we would likely detect significant relationships between variables even if the program produced only incremental change. Under the HHS criteria, this would constitute “meaningful evidence,” but while statistical significance is important, I would be more interested in the parenting program’s effect size and the type of effect it has. If 10 fewer children per 1,000 in the sample experienced child abuse, I would see that as a worthwhile effect and this as a worthwhile program. But if the program’s effect was to cause parents to say they went from being “slightly familiar” with using timeouts as a form of discipline to “moderately familiar,” I would be less enthusiastic. Both effect size and the outcome itself matter. Statistical significance is not, as the HHS notice implies, a synonym for meaningful effect.

Using statistical significance alone as a measure also biases us toward over-weighing studies that are well funded and able to collect very large bodies of data. Some widely adopted programs that target specific populations may not be able to tap into the kinds of money and resources they need to have statistical power. If we want to hold research to a measure of statistical significance, we need to think seriously about how the government and foundations can do more to support studies in child welfare by entities, jurisdictions, and populations that have not historically been the recipients of big research grants.

Questioning what “evidence based” really means is not an academic argument about research methods. It is about ensuring the ability of the child welfare field to use a wide array of effective, meaningful prevention services so that children and families remain safe and intact. We owe it to our children, our families, and our communities to get this prevention opportunity right.