Data Cards Playbook: Participatory Activites for Dataset Documentation
In the context of developing, deploying, and using datasets, a critical aspect in maintaining a good relationship between stakeholders (from data practitioners to the public) is for them to understand the importance of transparency to that relationship. Without that, it is impossible to secure an ethically appropriate measure of social transparency in dataset practices, as well as good faith efforts by stakeholders to embrace all aspects of responsible transparency. A lack of consensus and evolution of transparency efforts could pose major risks to society as these dataset documentation practices serve as the foundation for data technologies that impact the quality of people’s lives. Transparency, which we define as sharing information about system behavior (casual or observed) and organizational process that is concise, understandable, and articulated with plain language explanations, is captured in the form of boundary objects such as documentation and summarized reports. These objects typically provide descriptive, structural and sometimes statistical information about a dataset. But details that go beyond metadata, including provenance, representation, usage and fairness-informed evaluations—context stakeholders require for making responsible decisions about dataset use—are often not included or unavailable. As a consequence, stakeholders are ill-equipped to make informed decisions around dataset release and adoption. In this session, to address current limitations in this practice and obtain more desirable documentation oUCTomes, participants will learn about and use the Data Cards Playbook (“Playbook”) to establish a foundation for producing, publishing, and measuring transparent documentation of datasets. The Playbook is a framework-agnostic, human-centered, participatory approach to dataset documentation we created that offers a consistent way for stakeholders to extract knowledge distributed across the many individuals involved in creating datasets. In turn, this knowledge acquisition process encapsulates the unique information needs of consumers and reviewers of a dataset, beyond metadata. Altogether, the Playbook provides guidance on gathering this information with usefulness, thoughtfulness and measurability in mind. Over the course of the workshop, participants will go through a set of group activities and engaging discussions that will assist them in creating, completing, and customizing templates for different types of dataset documentations. Using a carefully crafted vignette to facilitate in-depth and multi-faceted explorations of complex dataset documentation creation, participants will work in small groups with an appointed facilitator. First, groups will identify stakeholders and their information needs in relation to our vignette material. This activity will involve participants developing an understanding of our taxonomy of stakeholders to align on agents—stakeholders who use, evaluate, or determine how the dataset is used. Next, participants will draft details on what their dataset documentation will need to capture and how it might be organized. Then, participants will assemble a straw-man proposal to allow groups to critically examine questions generated in the previous activities so as to find gaps and opportunities for proper documentation. Thereafter, participants will formulate a plan for producing answers to their agreed-upon questions, followed by a recap on all the design principles introduced throughout this session. By the end of this session, participants will acquire a clear understanding of the actions and commitments required to create dataset transparency reports that are accessible to diverse stakeholders, with foundational knowledge and design principles, as well as artifacts created throughout this session, necessary to apply this approach in their respective domain.