I’ve been working with data for decades. And over the years, I’ve found myself increasingly uncomfortable with the way it’s used, taught, collected, and communicated. Inequity and bias in data science are everywhere.
Data science gives us the ability to achieve change at an unprecedented scale and pace. But it’s not without problems. The capital-S part of data ‘Science’ offers the impression the numbers we work with are objective; without bias. And data experts’ use of confusing jargon in place of regular, easy to understand phrasing doesn’t help.
But the problem is fixable.
Demystify: we’re providing the information to get the alternative data science education we all need. Plainspoken explanations addressing all the concepts from the most simple to the most complex. We’re going to become effective critical thinkers around every step in the Data Life Cycle.
Democratize: we’re offering tools, training, and resources to get your project/group/company up to speed now so they can swim in the deep waters of data science confidently.
Demonstrate: we’re getting experimental with the best minds across the industry to show the world how to get a handle on algorithms, machine learning, new data visualization methods, and more, preparing us all for a more equitable future.
Just like in the Wizard of Oz, we must pull back the curtain so the world can see that data (and data scientists) are not infallible. We need to start a conversation about bias in data science that everyone can participate in.
All my years of dissatisfaction with the bias and the pretention of objectivism have culminated in a decisive step to address the problem. I’m proud to introduce We All Count, a project for equity in data science.
Join us as we share examples, build tools, and provide training and education aimed at helping better understand data — so we can make it more transparent and fairer for everyone. Because when you do the math, we all count.
Who can be a part of We All Count?
Everyone. Citizens, thought leaders, computer scientists, students, evaluation professionals, journalists, artists, decision makers — everyone.
A Matter of Life and Death
More and more, algorithms make decisions on our behalf. We rely on them to determine what gets purchased, what gets funded. Algorithms that many of us don’t even understand tell us whose project is working and which policies are effective.
Algorithms decide who lives and who dies.
Data is changing the world — literally. But thinking of it as a hard science gives us the very mistaken impression that statistics aren’t subjective. That data equals fact. That relying on an algorithm to make our decisions makes life objectively fairer.
But bias in data science permeates our systems. It’s hidden in every step of the process; data collection, analysis, communication, and visualization are rife with inequality-causing assumptions, misunderstandings, blind spots, shortcuts, and outright errors.
How did this happen?
Data science is rooted in a western-oriented, academic tradition that made huge improvements to everyone’s lives and changed the world as we know it. However, the limitations and biases of the classical school of data analysis, visualization, and data literacy have long overstayed their welcome. The majority of today’s data scientists have some old-school and ineffective paradigms and methodologies embedded in the very foundation of their work, while students and interested citizens around the world are being indoctrinated into some very problematic data science concepts.
Stay calm. Mostly, we’re talking about problems that can disappear just by becoming aware of them (like how survey selection can become racially or economically biased in sneaky ways). Some that need better systems or problem solving (like how to construct less biased research questions or more accessible graphics). And some things (the stuff that gets us really excited) are totally unexplored areas of concern in data science that need complex collaborative answers.
Data scientists, journalists, policy makers, visualizers, governments, NGOs, and citizens everywhere need to get a handle on this. We need to acquire the tools and understanding to catch and fix these issues and ditch the old-world mentalities of defunct data assumptions.