Everytime I see a stat like “25% of respondents are Black”, I see only one piece of a four-piece puzzle filled in. With only this piece, I don’t know how to use this information. I don’t know if it matches the category from my dataset, I don’t know if it reflects how I want to engage with the idea of “Black”, and most importantly I don’t know who is drawing these identity boxes and why.
When it comes to social identity data, these are the blanks that always need to be filled in:
Who defines the categories?
What categories are there?
Who makes the selection?
What was selected?
Let’s look at an example.
25% of students are Black.
Seems simple on the surface, but it’s got three major blanks to fill in. We know that 25% of students were identified as Black. Who made this selection?
25% of students self-identified as Black.
Ah, ok cool. So the students chose themselves. That seems like it’s putting the power into their hands, but what did they get to choose from? The options provided are just as important as the choice, if not much more so. They might have well-represented the preferred choice of the chooser or not. Every time someone writes a list of social identities, they draws a set of simple boxes around a very complicated world. What categories were offered this time?
25% of students self-identified as Black from a list of “Black, Hispanic, White, Mixed Race, Asian, Indigenous, Other, Prefer Not To Say”.
Great. Now that I can see the list I can begin to understand the worldviews embedded in the context of the identity question they are asking. I can compare it to my equity definitions and priorities and see its compatibility, its strengths and weaknesses. But there’s one more important power dynamic to be unveiled here: where did this list of identities come from? Who chose them?
25% of students self-identified as Black from a list of “Black, Hispanic, White, Mixed Race, Asian, Indigenous, Other, Prefer Not To Say”, created by the University admissions department in consultation with a student advisory panel.
Now, and only now, can I start to make meaning out of this data. I have all the key blanks filled in and I can really understand what was meant when “Black” was selected and what kind of choices the students were offered. Think about how differently I might use (or not use) this stat if it were like this:
25% of students were counted by their teachers as Black from a provided choice of “White, Black, or Other”, suggested in a staff meeting.
These are two real-world examples that I’ve dealt with and “25% Black” is not the same number across both cases. I can’t equitably use either of them unless I know the rest of that information.
Let’s look at another example:
Most gay residents oppose the new law.
After our first example, this one seems almost empty of key information. We need to understand what perspectives and definitions are embedded here before we can approach using this data.
Let’s look at two possible filled-in cases:
1. Most gay residents (who checked a box labeled “do you identify as LGTBQ+?” on their standard state medical forms) oppose the new law.
VS.
2: Most residents who wrote in “Gay”, “Homosexual”, or “Lesbian” in a blank, non-required “What is your current sexual orientation?” question, (adopted by the research team at our advocacy organization to match previous years’ data) oppose the new law.
By getting just a little more detailed information about who is defining these categories, how they are being created, and who is choosing between them, I can identify incompatible worldviews and address equity issues. For example, if I don’t equate the term “gay” with “anyone who checked an LGTBQ+ box” (which I don’t), then to reflect my equity priorities I can’t engage with this data the way it is (at least not uncomplicatedly and wholesale).
In the second case we can see that even what was actually selected is actually more nuanced than in the incomplete version. That happens all the time. I can think about things like the fact that by aggregating only certain written-in responses, they are leaving out lots of folks who might have been included in the first case like those who wrote in orientations like bisexual or identities like transgender, not to mention the typically higher number of illegible, corrupted or discounted data that comes with that format. These aren’t fatal flaws for either example, we just need to know. I’m not suggesting that we ignore or discard all information that doesn’t neatly fit into our definitions, I’m just saying we need to see this information in other people’s data and we need to include it with ours.
When we have “social construct” categories we are drawing very simple boxes around an infinitely complex world. Sometimes there are good equity reasons to do this, but if we assume that the who, how, and why of how we drew them is shared by everyone, we move past data collecting into entrenching or even enforcing our own definitions and perspectives. Identity components like this are complex, sometimes fluid, and highly personal. We need to make sure everyone can see the options, the choices, and the choosers when it comes to something this sensitive and diverse.
At We All Count we’re pushing to stop ever reporting social identity data without at least – I say at least because there are more things that can be important to know, but these ones are the most essential – these crucial facets of that information. Again, they aren’t “additional” information, they are facets of the information. We like to say that it’s not metadata, it is the data. Without it you just have a fragment that at best gets misinterpreted and at worst allows prejudiced, oppressive and harmful definitions, systems and assumptions go unchallenged.