Paper Session 25
AbstractAsk a Question?
The Use and Misuse of Counterfactuals in Ethical Machine Learning
The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be considered are social categories such as race or gender. We review a broad body of papers from philosophy and social sciences on social ontology and the semantics of counterfactuals, and we conclude that the counterfactual approach in machine learning fairness and social explainability can require an incoherent theory of what social categories are. Our findings suggest that most often the social categories may not admit counterfactual manipulation, and hence may not appropriately satisfy the demands for evaluating the truth or falsity of ounterfactuals. This is important because the widespread use of counterfactuals in machine learning can lead to misleading results when applied in high-stakes domains. Accordingly, we argue that even though counterfactuals play an essential part in some causal inferences, their use for questions of algorithmic fairness and social explanations can create more problems than they resolve. Our positive result is a set of tenets about using counterfactuals for fairness and explanations in machine learning.
Fairness in Risk Assessment Instruments: Post-Processing to Achieve Counterfactual Equalized Odds
In domains such as criminal justice, medicine, and social welfare, decision makers increasingly have access to algorithmic Risk Assessment Instruments (RAIs). RAIs estimate the risk of an adverse oUCTome such as recidivism or child neglect, potentially informing high-stakes decisions such as whether to release a defendant on bail or initiate a child welfare investigation. It is important to ensure that RAIs are fair, so that the benefits and harms of such decisions are equitably distributed. The most widely used algorithmic fairness criteria are formulated with respect to observable oUCTomes, such as whether a person actually recidivates, but these criteria are misleading when applied to RAIs. Since RAIs are intended to inform interventions that can reduce risk, the prediction itself affects the downstream oUCTome. Recent work has argued that fairness criteria for RAIs should instead utilize potential oUCTomes, i.e. the oUCTomes that would occur in the absence of an appropriate intervention. However, no methods currently exist to satisfy such fairness criteria. In this paper, we target one such criterion, counterfactual equalized odds. We develop a post-processed predictor that is estimated via doubly robust estimators, extending and adapting previous post-processing approaches to the counterfactual setting. We also provide doubly robust estimators of the risk and fairness properties of arbitrary fixed post-processed predictors. Our predictor converges to an optimal fair predictor at fast rates. We illustrate properties of our method and show that it performs well on both simulated and real data.