The following is an excerpt from Heather Krause’s 2022 Keynote: “Who is the Data Privileged Person?”. So, a statistical model in a data project is simply all of the relationships between variables setup in the way that is the modeller’s best possible...
How much water per bouquet? If we watered them all using the average required per bouquet, we’d over water one and underwater one. What’s the problem: we’re using the denominator of bouquets instead of flowers. Defining your denominator is as important as defining...
One of the best ways to talk about some of the equity challenges posed by the data science process is what we like to call the “bowtie”. The ends of the bowtie are almost always broader than the knot at the center, and it’s how you tie the knot that keeps the bowtie...
When we try to measure gaps in outcomes between groups, we often turn to an approach called a Blinder-Oaxaca Decomposition. I’m all for identifying discriminatory gaps, but we need to be careful that we don’t discount certain kinds of discrimination from our data...
Author’s Note: This is going to be a long piece, but if we can get this concept down we’ll learn a way to embed our equity priorities deep, deep into the mathematical heart of our data work. Let’s go. The Model: A reflection of the world as the modeller understands...
When we use data to predict something, there’s more than one way to improve the equity of that process. The one that we usually start with is setting a tolerance level for the gap between the group our predictive model works best for and the one it performs worst at....