Paper Session 22
AbstractAsk a Question?
Biases in Generative Art---A Causal Look from the Lens of Art History
With rapid progress in artificial intelligence (AI), popularity of generative art has grown substantially. From creating paintings to generating novel art styles, AI based generative art has showcased a variety of applications. However, there has been little focus concerning the ethical impacts of AI based generative art. In this work, we investigate biases in the generative art AI pipeline right from those that can originate due to improper problem formulation to those related to algorithm design. Viewing from the lens of art history, we discuss the socio-cultural impacts of these biases. Leveraging causal models, we highlight how current methods fall short in modeling the process of art creation and thus contribute to various types of biases. We illustrate the same through case studies, in particular those related to style transfer. To the best of our knowledge, this is the first extensive analysis that investigates biases in the generative art AI pipeline from the perspective of art history. We hope our work sparks interdisciplinary discussions related to accountability of generative art.
Measurement and Fairness
We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable properties (and other unobservable theoretical constructs) thought to be related to them -- i.e., operationalized via a measurement model. This process, which necessarily involves making assumptions, introduces the potential for mismatches between the theoretical understanding of the construct purported to be measured and its operationalization. We argue that many of the harms discussed in the literature on fairness in computational systems are direct results of such mismatches. We show how some of these harms could have been anticipated and, in some cases, mitigated if viewed through the lens of measurement modeling. To do this, we contribute fairness-oriented conceptualizations of construct reliability and construct validity that unite traditions from political science, education, and psychology and provide a set of tools for making explicit and testing assumptions about constructs and their operationalizations. We then turn to fairness itself, an essentially contested construct that has different theoretical understandings in different contexts. We argue that this contestedness underlies recent debates about fairness definitions: although these debates appear to be about different operationalizations, they are, in fact, debates about different theoretical understandings of fairness. We show how measurement modeling can provide a framework for getting to the core of these debates.
One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
Computer vision is widely deployed, has highly visible, society altering applications, and documented problems with bias and representation. Datasets are critical for benchmarking progress in fair computer vision, and often employ broad racial categories as population groups for measuring group fairness. Similarly, diversity is often measured in computer vision datasets by ascribing and counting categorical race labels. However, racial categories are ill-defined, unstable temporally and geographically, and have a problematic history of scientific use. Although the racial categories used across datasets are superficially similar, the complexity of human race perception suggests the racial system encoded by one dataset may be substantially inconsistent with another. Using the insight that a classifier can learn the racial system encoded by a dataset, we conduct an empirical study of computer vision datasets supplying categorical race labels for face images to determine the cross-dataset consistency and generalization of racial categories. We find that each dataset encodes a substantially unique racial system, despite nominally equivalent racial categories, and some racial categories are systemically less consistent than others across datasets. We find evidence that racial categories encode stereotypes, and exclude ethnic groups from categories on the basis of nonconformity to stereotypes. Representing a billion humans under one racial category may obscure disparities and create new ones by encoding stereotypes of racial systems. The difficulty of adequately converting the abstract concept of race into a tool for measuring fairness underscores the need for a method more flexible and culturally aware than racial categories.