Paper Session 18
AbstractAsk a Question?
Can You Fake It Until You Make It?: Impacts of Differentially Private Synthetic Data on Downstream Classification Fairness
The recent adoption of machine learning models in high-risk settings such as medicine has increased demand for developments in privacy and fairness. Rebalancing skewed datasets using synthetic data created by generative adversarial networks (GANs) has shown potential to mitigate disparate impact on minoritized subgroups. However, such generative models are subject to privacy attacks that can expose sensitive data from the training dataset. Differential privacy (DP) is the current leading solution for privacy-preserving machine learning. Differentially private GANs (DP GANs) are often considered a potential solution for improving model fairness while maintaining privacy of sensitive training data. We investigate the impact of using synthetic images from DP GANs on downstream classification model utility and fairness. We demonstrate that existing DP GANs cannot simultaneously maintain model utility, privacy, and fairness. The images generated from GAN models trained with DP exhibit extreme decreases in image quality and utility which leads to poor downstream classification model performance. Our evaluation highlights the friction between privacy, fairness, and utility and how this directly translates into real loss of performance and representation in common machine learning settings. Our results show that additional work improving the utility and fairness of DP generative models is required before they can be utilized as a potential solution to privacy and fairness issues stemming from lack of diversity in the training dataset.
We introduce leave-one-out unfairness, which characterizes how likely a model’s prediction for an individual will change due to the inclusion or removal of a single other person in the model’s training data. Leave-one-out unfairness appeals to the idea that fair decisions are not arbitrary: they should not be based on the chance event of any one person’s inclusion in the training data. Leave-one-out unfairness is closely related to algorithmic stability, but it focuses on the consistency of an individual point’s prediction outcome over unit changes to the training data, rather than the error of the model in aggregate. Beyond formalizing leave-one-out unfairness, we characterize the extent to which deep models behave leave-one-out unfairly on real data, including in cases where the generalization error is small. Further, we demonstrate that adversarial training and randomized smoothing techniques have differing effects on leave-one-out fairness, which sheds light on the relationships between robustness, memorization, individual fairness, and leave-one-out fairness in deep models. Finally, we discuss salient practical applications that may be negatively affected by leave-one-out unfairness.
Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings
Machine learning models in health care are often deployed in settings where it is important to protect patient privacy. In such settings, methods for differentially private (DP) learning provide a general-purpose approach to learn models with privacy guarantees. Modern methods for DP learning ensure privacy through the addition of calibrated noise. The resulting privacy-preserving models are unable to learn too much information about the tails of a data distribution, resulting in a loss of accuracy that can disproportionately affect small groups. In this paper, we study the effects of DP learning in health care. We use state-of-the-art methods for DP learning to train privacy-preserving models in clinical prediction tasks, including x-ray classification of images and mortality prediction in time series data. We use these models to perform a comprehensive empirical investigation of the tradeoffs between privacy, utility, robustness to dataset shift and fairness. Our results highlight lesser-known limitations of methods for DP learning in health care, models that exhibit steep tradeoffs between privacy and utility, and models whose predictions are disproportionately influenced by large demographic groups in the training data. We discuss the costs and benefits of differentially private learning in health care with open directions for differential privacy and machine learning for health care.