The Promises and Limitations of Predictive Methodologies in the Public Sector

By: Shannon Mickelson

Real-Time Stats for Real-Time Problems: The Promises and Limitations of Predictive Methodologies in the Public Sector.

Predictive analytics are a hot topic in many fields right now. Predictive analysis, or risk modeling, asks how we can use data not only to understand and evaluate what has gone on before, but also to predict what may happen, and intervene before it ever does. While it has long been used in fields like insurance (actuarial work, for example), it’s also increasingly being used in criminal justice, child welfare, policing, homeland security, and other areas in the public sector. It is essentially reverse-engineering a predictive formula from a standard statistical analysis; taking the sort of regression that would be used as a past-tense project in a static research paper and using it to dynamically forecast future events.

There is much about this approach that is promising. Harnessing the power of data to improve our understanding of the world is very “in” right now. And as a data geek who works in public health, my professional life is focused on finding ways to apply the numbers in order to improve services for our population, with limited resources. Any way we can use data alongside all our other resources to better ourselves is fantastic, right?

Mmm, maybe. But is there a dark side to all of this?

Perhaps Tom Cruise can help me explain. (Stay with me here.)

Photograph of actor, Tom Cruise, waving. A crowd of people taking photographs stand behind him.

“Minority Report” was a summer blockbuster in the United States when I was a teenager. The 30-second version: Tom Cruise as a cop in not-too-distant-future, where the “Pre-Crime” division of the local police receives predictions of murders, thanks to humans with precognitive abilities, that will occur. This leads to the preemptive arrest and incarceration of all theoretical perpetrators. Of course, the conflict arises when they accuse the protagonist and he sets out to prove that the course of human choice is not constrained by such predictions. Unsurprisingly, it turns out that not only are future events not always perfectly predicted, but that interventions of any kind may also alter what has been predicted.

We may not have a team of preternatural beings churning out knowledge of the future. But we do have “big data” analytics and forms of machine learning that sometimes outstrip our capacity to reason through all potential consequences and ethical considerations. And the intersections of data and social justice — including where data usage fails the causes of equity and justice — are being continually documented and explored in multiple ways.

In my capacity as senior research analyst for the behavioral health division at the Multnomah County Health Department in Oregon, I’m invested in trying to ensure that our services are high-quality, inclusive, equitable, and doing their part to address mental illness and substance abuse in our community. And at the same time, as anyone who works in social services knows, resources are finite — both time and money. And with that resource allocation comes the moral imperative to do the best we can by every person we serve.

With these aims in mind, I have worked on multiple projects over the last several years focusing on acute care. Acute care, in our context, means psychiatric hospitalizations, emergency department visits driven by behavioral health concerns, or use of the psychiatric emergency room. This utilization is absolutely necessary in many cases. However, ideally, we hope to connect individuals to community-based treatment that help address their needs in less restrictive environments and avoid ever reaching the level of crisis that precipitates acute care utilization. This is both better client care and better resource utilization.

We have long recognized the need to engage with people who heavily utilize these services. We have teams of coordinators that meet people as they are discharged from the hospital or while they are still in the emergency room, to help in assessing need and making connections to care. But what if we could reach a client before they reached this level of crisis?

Using the predictive modeling approach described above, we developed a risk prediction tool to help us identify clients at risk or in current crisis. Ultimately, eight unique variables were included that collectively were highly predictive of impending acute care events. We used the weights of those variables to craft a formula that could be applied to every single adult in our system daily, to assess risk. Both the original model and a model using the single score approach were tested multiple times for reliability — using different populations to see how consistent the results were. We followed fairly standard steps in trying to create a generalizable analysis.

But this brings us back, once again, not only to broad ethics, but specific concerns of equity.

We know there are health inequities in our system. What does it mean if we overpredict or underpredict risk for a specific demographic group? How can we avert that? How can we help ensure that those who need services, especially where that need is compounded by systemic inequities, are highlighted, but in a way that does not systematically either target or ignore demographic groups in inequitable ways?

We used a two-step process to examine this possibility and decide upon how to deal with it.

First, we included all available demographic information in our original statistical analysis as controls: race/ethnicity, age range, sex, and primary language (dichotomized as English versus non-English, due to low counts). We built our original scoring model from these covariates, utilizing all significant variables except demographics.

We then ran subsequent models on segments of the population while omitting these variables. We ran the eight-variable model on each race, each age range, each sex, and each language category, and then on randomized combinations of the above. In other words, we ran this model on white clients, black clients, clients in their 30s, clients in their 40s, males, females, English speakers, non-English speakers…Native American English-speaking women in their 50s, Asian non-English speaking men in their 20s, and so forth. We compared the results and the predictive power of each model, and determined that while there was some variation, the range was small, and every tested group remained in a solidly good predictive range. We felt confident in these results and decided to proceed.

But the conversation on ethics wasn’t (and isn’t!) done yet. We built the tool, pushed it out to an interactive dashboard, and conducted further revalidations, with success. It was time to see how it might really work in real practice daily, prompting us to revisit the following questions that had arisen along the way:

How will this information be used? How could it be used, regardless of intent? What are potential unexpected uses and impacts?

Does it perpetuate existing disparities or create new ones?

Do we have a full understanding of what it can and cannot do? Are we prepared and equipped to educate users of this information?

Can we open source the information behind this analysis and open ourselves to critiques?

Part of answering these questions is ensuring that our enthusiasm doesn’t outstrip our education and learning phase; that we do not roll out a project until we are confident that its use will not violate any of the intended principles, and that we open ourselves to ongoing monitoring and criticism to ensure that remains true.

When it came to the first question of how this information will be used, we established the following principles for all staff to know and understand:

Just because someone has a high score doesn’t mean they will go to the hospital; just because someone has a low score doesn’t mean they won’t. Statistics are not perfect, and human beings are far too “messy” to encapsulate in a single number.

We value client autonomy and personal agency in making decisions. We will not force services because this tool says they may have higher need of said services.

We value clinical judgment. This tool should not be used to override a staff member’s determination that a client is high risk. It may, however, be a helpful flag to staff that a client they thought was low risk deserves a second look.

We do not use this to deny services. We will not take away any services or benefits because our tool says they are low risk.

We preliminarily addressed whether we are furthering disparities with our demographic analyses, but are committed to continuing to monitor and revisit this topic in the future.

With the third question in mind (do we really understand our tool and can we educate others?), we have decided to only roll this project out to one team at a time. That rollout includes both training and written documentation (including the aforementioned core principles) clearly stating what this tool can and cannot do, and what it should and not be used for. As evaluators informing actual daily practice, I believe our ethical responsibility goes far beyond “this is how A likely affects B” and walking away. We have a responsibility to the work we put out into the world and to the people it may impact.

Finally, can we open up the “black box” to everyone and accept critiques? As a public agency, we consider what we do as belonging to the public, both legally and ethically. I have shared the inner workings of this algorithm publicly, in multiple contexts, and while detailed information is available upon request, we also hope to more formally open-source the entire algorithm code in the future.

As it has been stated many times on this site: data, algorithms, analyses…none of them are purely objective things. We need both responsible development and responsible use of the tools we forge. And as much as we’ve tried to incorporate these principles, there’s always more we can do. Have an idea on how we can improve further, or want to know more about the acute care risk tool and how it works? Leave a comment or shoot me an email!

Shannon Mickelson Campbell, MPP, is the senior research and evaluation analyst for the Mental Health and Addiction Services Division at the Multnomah County Health Department in Portland, Oregon. Multnomah County has 750,000 residents; MHASD provides direct clinical services, operates a 24/7 crisis line, engages in case management and care coordination for clients in our systems, provides community outreach and education, and manages the behavioral health benefit for our regional Medicaid coordinated care organization, which covers 170,000 people. She can be contacted at shannon.campbell@multco.us. When not nerding out with data, she can be found spending time with her partner, 2 year old daughter, and 14 year old pug; dressing up at comic cons; traveling internationally; chilling at the Oregon coast; or drinking tea while reading Scandinavian crime fiction.

Tags:

Analysis Guest Author/Interview Interpretation

The Promises and Limitations of Predictive Methodologies in the Public Sector

Related Articles