Opinion
Data availability is social justice
The lack of equity in public health outcomes is the subject of much focus. Health equity is built on data and driven by data science. But if data science is to serve as a true boon for justice and equity in public health, there are basic failures in access that must be addressed.
Despite years of lively debate about uses of big data, the way researchers collect data and design projects does not take into account justice or equity. Algorithms have become ubiquitous in daily life, yet analyses routinely show that they are engineered with bias and with private interests in mind. These problems are the focus of a growing number of eminent scholars, among them Ruha Benjamin, Keolu Fox, Timnit Gebru, and Abigail Jacobs.
Sign up for Harvard Public Health
Delivered to your inbox weekly.
I saw how deeply flawed our data pipelines are when I became involved in a recent study on how COVID-19 changed the dynamics of U.S. prisons. Our research team had experience with studying health inequalities in various settings, and we expected there would be plenty of data to interrogate. There was, but much of it was not easy to get. It is often challenging to find clean, available data that provide a clear window into the status of disenfranchised communities, like those who are incarcerated.
The study brought together epidemiologists, computer scientists, legal historians, physicians, and others and was led by Brennan Klein, a network scientist at Northeastern University. To do the work, our team constructed what is one of the most detailed data sets of the American criminal legal system, with 7,000 records going back 20 years across all 50 states and the District of Columbia.
The findings were striking. We learned that the COVID-19 pandemic fostered the largest decarceration event in American history, with the overall prison population declining by 16 percent. But this drop included a peculiar pattern: The pandemic saw an increase in the proportion of Black and other non-white populations in prison. This represented a reversal of a decade-long process prior to the pandemic, where the White prison population was increasing relative to non-white populations.
What could have caused this pattern? Our data exploration revealed that court closures during the early months of the pandemic decreased both admissions and releases. The people being released were primarily citizens who had completed their sentences. And so what we saw in the data was the byproduct of large-scale differences in average sentence length by race, a topic that has long been the focus of study by legal scholars. The related public health inequity should not be surprising: Groups that remained in prison faced an increased risk of infection from COVID-19 and other health risks.
One thing we showed is that social upheaval of the sort caused by epidemics and pandemics influences institutions like the prison system in ways that aren’t usually visible. Conducting rigorous data science can reveal underappreciated health equity phenomena. And our research is consistent with the work of scholars and activists who have long held that health data invisibility is a barrier to understanding and addressing long-standing health inequities.
Researchers who want to expose health inequities can draw three lessons from our approach.
- Look for where the data isn’t. To engage in health-justice-oriented data science, seek out research topics where data are sparse. These are likely areas where underserved populations can be found. In our case, we observed gross inconsistencies in how prison data were collected and reported across different states. These obviously made it challenging to examine long-term, large-scale dynamics within the prison systems. Cynics might suggest that the data paucity and inconsistency intentionally muddies the signal. Whatever the reason that the data were hard to come by, the lack of data is an obstacle to shedding light on what is happening in the US criminal legal system.
- Build unusually diverse research teams. The breadth of expertise on our team led to serendipity that allowed us to solve a complex health problem. We could apply data science, statistics, legal analysis, and epidemiology. It was also critical that our team included several multidisciplinary-minded scholars who had a specific skill but challenged themselves to converse with experts in other domains. We checked our egos at the door, and we learned from each other as we worked to solve our problem. And we credited everyone, independent of their affiliation or career stage.
- Share all of your data. Certain corners of the scientific and medical worlds have a culture of transparency for data, computer code, and metadata. This is in large part to help others reproduce results. In social justice research, making data sets open can help others find phenomena and dynamics that a team may have missed or not had the resources—including time and space—to pursue. We also chose to make our article fully open-access and made the exchange between us and the reviewers accessible to the public as well, so that those interested could see their many valid criticisms, and our responses. This last aspect is key: All health-related research should be a conversation, where everyone should be privy to the processes, dialogues, and details.
These three steps are each difficult, but following them can be fruitful for answering questions at the interface between social science and public health. And we encourage research teams to employ methods like hack-a-thons, flash mobs, and other creative approaches to help answer public health questions.
Many researchers in quantitative fields hold that a social justice lens compromises objectivity or technical acuity in health research. Our team found that being inclusive, transparent, and equity-driven can lead to higher-quality data and provide a way to cut through the tangles of society and see into health inequalities.
Data availability is always a key component of social justice, and new perspectives and methods are needed on the data curation side to open paths to new questions, directions, and results. Otherwise, it can be difficult or even impossible to see the signature of health inequalities in a manner that allows us to address them.