Facebook Pixel

The Identity Sorting Dials are a really useful tool to start (again, I’ll say start) thinking about the interplay between human considerations like fit (“can I see myself in this category”) and ease (“how difficult is it for me to interact with this data process”) and technical priorities like certainty (what kind of confidence interval do our categories lead to) and specificness (what is the resolution of the data we need).

The certainty dial refers to the statistical certainty offered by the sample size of each of our groupings. Sample size isn’t the only thing that affects our point estimates and confidence intervals (important measures of how certain we are that a statistical estimate reflects reality) but it is a major factor and is largely determined by the groups we choose to put people’s data into. Really small sub-groups tend to lead to really low levels of certainty.

Let’s say we want to collect data about a person’s disability status because we want to examine gaps in how our services work for people across an ability spectrum. There are many different recommendations from excellent organizations about how to do this and what categories to use. The choice of categories here is going to have a strong impact on how much certainty we are able to have in our results.

When we are using our certainty dial, it reflects the certainty we can produce from the sample size of the smaller or smallest groups.

Which of these best describes your disability status?

  1. I am a person with a disability
  2. I am not a person with a disability

With only two categories we might have a good chance at highly reliable and certain results for both groups. If there are more respondents without a disability than with, their category may get a smaller confidence interval, but ideally both categories could produce a robust set of results in relation to our question.

Of course, the level of fit and the level is specificness is very low with only two categories. From the human side, we may want to be more sensitive to the many different experiences of disability and show respect for those experiences by offering categories that more closely describe and relate to our respondents’ lives. On the science side, two categories may simply not offer enough information and resolution to be useful in answering our questions.

We could use much more detailed and representative options:

Which of these best describes your disability status?

  1. I am a person with a physical disability
  2. I am a person with an intellectual disability
  3. I am a person with a mental disability
  4. I am a person with a sensory disability
  5. I am not a person with a disability

Which of these best describes your disability status?

  1. I have blindness or low vision (does not include vision correctable by glasses or contact lenses)
  2. I have a hearing disability (example: deaf, deafened or hard or hearing)
  3. I have a developmental or cognitive disability (example: Down syndrome)
  4. I have a learning disability (example: dyslexia)
  5. I have a mental health disability (examples: addictions, bipolar disorder, depression)
  6. I have a mobility disability (examples: cane, wheelchair)
  7. I have a physical, coordination, manual dexterity, or strength (example: handling objects)
  8. I have a physical illness and/or pain (examples: diabetes, epilepsy, heart condition, kidney disease, lung disease, rheumatoid arthritis)
  9. I have a speech and language disability (not caused by hearing loss)
  10. I have a disability not listed, please describe
  11. I am not a person with a disability

Which of these best describes your disability status?

(This list represents the 21 nationally recognized disabilities in the country of India as of the 2016 Rights of Persons With Disabilities Act)


  1. Blindness
  2. Low-vision
  3. Leprosy Cured persons
  4. Hearing Impairment (deaf and hard of hearing)
  5. Locomotor Disability
  6. Dwarfism
  7. Intellectual Disability
  8. Mental Illness
  9. Autism Spectrum Disorder
  10. Cerebral Palsy
  11. Muscular Dystrophy
  12. Chronic Neurological conditions
  13. Specific Learning Disabilities
  14. Multiple Sclerosis
  15. Speech and Language disability
  16. Thalassemia
  17. Hemophilia
  18. Sickle Cell disease
  19. Multiple Disabilities including deaf-blindness
  20. Acid Attack victims
  21. Parkinson’s disease
  22. I am not a person with a disability

These options split our population into more and more groups, resulting in smaller and smaller groups, which due to the nature of the basic tools of quantitative data science, lead to wider and wider confidence intervals (ie., less certainty on our certainty dial).

I want to be clear, I’m not advocating for one of these examples over another. What categories you create depends on what you want to understand and how you want to go about doing so.

What I am saying is that there is a trade-off in creating more and more categories. “List everyone!” is a natural inclination if you care about representation and inclusivity, and/or detail and nuance, yet as your sub-groups represent smaller and smaller segments of your population the certainty difference between the largest group – i.e. (depending on your population!) “I am not a person with a disability” – and your smallest group is going to get bigger; the gap grows wider.

If the most powerful or privileged category in your project is also the largest, you’ve inadvertently just privileged them further by creating the most certainty for their results and the lowest certainty for those in categories with the fewest people. That “certainty privilege” means we will frankly know more about the larger groups. This is a crucial equity issue when we create our public policies, design our products, draft our laws, or evaluate our services using data.

Do not despair! The point of the Identity Sorting Dials is to help identify the sweet spot amidst the ineluctable compromise between certainty, fit, specificness, and ease (not just in data collection, like we’re talking about today, but also in analysis and reporting!).

There are all kinds of cool ways to flip problems like this, especially when we approach our data categories from the ground up, exploring novel and more effective ways to get the detailed and humanizing data we want without crafting a set of boxes that screw over the very people we’re trying to understand or support.

When it comes to identity, we draw the boxes and we are responsible for the varying levels of certainty that they create.