Facebook Pixel
Too often in data science, we use identity categories.


We once were hired by clients involved in a youth mental health situation where they needed to target scarce resources (why the resources were scarce is an entirely other conversation for a different time….) at providing support to young people in our community who were at risk for mental health issues. The client organization had research that showed that one of the primary drivers of the mental health issues among the youth in the community was bullying. So they wanted to make resources available to those most likely to be experiencing bullying.

How were we going to know who was getting bullied?

The client organization was pretty sure that young people who were not “white and straight” would be most likely to experience the most bullying. So they were considering making the resources available to young people who completed an intake form and identified as either a “person or color” or “LGTBQ+”.

However, we strongly discouraged them from doing this. The unintended consequences of these categories is that we are actually doubling down on marginalization and stereotypes. Instead of us preconceiving of who is likely to be having what lived experience, we encouraged them to ask directly. They did, using this scale:

Gatehouse Bullying Scale

We monitored the support system for 3.5 years and it turned out that poverty was a much more accurate predictor of bullying and the need for support than sexual identity or ethnic background. There was an active community of LGBTQ+ kids who had money, supportive parents and friends and were not experiencing much bullying.

In another project we were partnered with a disaster relief organization who wanted to be able to funnel resources quickly to people as disasters happened. The mandate of the disaster relief organization was to prioritize marginalized and oppressed people. In a very similar situation to the one with youth bullying, they were tempted to create an intake form that asked people their race and give resources to people who said they weren’t white. It is  understandable and even admirable thought to want to help someone – a specific identity – that you think might be in need of it, but the assumption that a type of person will have a high level of need based on who they are alone can have many bad, unintended consequences.

We strongly recommend that the organization allow its clients to self identify their experiences of oppression. The organization was worried. Wouldn’t this open up the opportunity for wealthy, privileged people to scam them out of resources by self-identifying as marginalized? Well, yes, kind of. But any data-oriented system is going to need to choose which side to err on – either building a system that might accidentally allow a few privileged people to access resources or building a system that might accidentally perpetuate a lot racism by formalizing and codifying a belief that skin color = limited access to resources.

In the end, we used this amazing resource from David R. Williams to craft a collection tool where people could self-identify their experiences of marginalization, oppression, and severity of need.

The Many Benefits of Looking Somewhere Other than Identity


 Going directly to self-identification of oppression, need, experiences or preferences also makes a bigger tent. Unless you have a mandate specifically to support a certain identity (which is totally fine as long as you are upfront about it!), measuring what you actually care about will allow anyone who fits your criteria to be included, regardless of who they are or what labels have been applied to them in the past.

One of the reasons that something like race might get used as a convenient proxy for something like poverty or oppression is that the inequity that identity of person is experiencing is so bad, so pervasive and structurally ubiquitous, that it is correct more often than not. If the likelihood seems so high that a certain kind of person has a certain kind of problem, it can seem good and efficient to use that identity to target them for help. But it’s very problematic even if you’re right.

Think of how it feels to be handed a survey that expects you to be poor because of who you are, and it’s right. It can actually feel worse than if it was wrong. It shows that the people collecting your data have decided in advance what your life is like without your input. “Oh you’re gay, you must be being bullied”. “Oh, you’re a woman, you must be getting paid less”. “Oh you’re single, you must need help raising your kids”. It entrenches stereotypes.

Let’s say you use self-identification of oppression, income data, and zip code information, and you find that people living in neighbourhoods with a large majority of Black residents (therefore they are likely Black respondents) are experiencing the problems you’re interested in. Then, it’s much more acceptable to incorporate identity as part of your plan.

Arriving at identities relevant to your question through the participation of your stakeholders is totally different than starting with assumptions about the identities that your participants are expected to conform to.



  • Sometimes it’s totally valid to ask about identity when it’s what you really care about.
  • But often, we’re using identity as a proxy to represent the issues we’re working on.
  • Making assumptions about people’s lived experience based on their identity is often inaccurate and frequently oppressive.
  • Letting people identify their own preferences, needs, and experiences requires giving up some control – but thinking you know better than someone about their own likely experience is definitely a power trip and can even be a form of active oppression.
  • Arriving at relevant identities through data, in an inclusive and useful way is a lot better than starting there and using your participants to test your (often prejudiced) hypothesis.
  • There are great, useful, easy to use alternatives to identity categories and questions (like this and this). Also, it’s not that hard to get creative and find your own ways to get past identity to the heart of what you really want to know in your project design, your data collection, your analysis and your reporting.