Summary: | Critical decisions like loan approvals, foster care placements, and medical interventions are increasingly determined by data-driven prediction algorithms. These algorithms have the potential to greatly aid decision-makers, but in practice, many can be redesigned to achieve outcomes that are fundamentally fairer and more accurate. This thesis consists of three chapters that develop methods toward that aim.
The first chapter, co-authored with Suhas Vijaykumar, demonstrates that it is possible to reconcile two influential criteria for algorithmic fairness that were previously thought to be in conflict: calibration and equal error rates. We present an algorithm that identifies the most accurate set of predictions satisfying both conditions. In a credit-lending application, we compare our procedure to the common practice of omitting sensitive data and show that it raises both profit and the probability that creditworthy individuals receive loans.
The second chapter extends the canonical economic concept of statistical discrimination to algorithmic decision-making. I show that predictive uncertainty often leads algorithms to systematically disadvantage groups with lower-mean outcomes, assigning them smaller true and false positive rates than their higher-mean counterparts. I prove that this disparate impact can occur even when sensitive data and group identifiers are omitted from training, but that it can be resolved if instead data are enriched. In particular, I demonstrate that data acquisition for lower-mean groups can increase access to opportunity. I call the strategy “affirmative information” and compare it to traditional affirmative action in the classification task of identifying creditworthy borrowers.
The third chapter, co-authored with Suhas Vijaykumar, establishes a geometric distinction between classification and regression that allows risk in these two settings to be more precisely related. In particular, we note that classification risk depends only on the direction of a regressor, and we take advantage of this scale invariance to improve existing guarantees for how classification risk is bounded by the risk in the associated regression problem. Building on these guarantees, our analysis makes it possible to compare classification algorithms more accurately. Furthermore, it establishes a notion of the “direction” of a conditional expectation function that motivates the design of accurate new classifiers.
|