2023 ADM+S Hackathon

Hack Pack

THE CHALLENGE

The most powerful generative language models today are concentrated in the hands of a few large firms with access to the necessary data and processing power. From internet search to virtual companionship, these companies have the potential to shape our information environment, which means that the values embedded in their models are of urgent social and political consequence.

The goal of this hackathon is to explore and map the values baked into some of the most popular generative language models, including Open AI’s ChatGPT, Google’s Bard, and Microsoft’s Bing Chat.

SOME DEFINITIONS

Some definitions might be helpful to get started;*

  • A bias is a tendency or natural inclination toward or against a particular thing, person, or group. Some biases can be positive and helpful — like choosing to only eat foods that are considered healthy,  or staying away from someone who has knowingly caused harm. However, biases can also be harmful – for instance when based on stereotypes, rather than actual knowledge of an individual or circumstance.
  • A stereotype is a generalised belief about a particular category of people. Stereotypes assume that the people belonging to a group all tend to share the same attributes — which can lead to harmful real world outcomes due to over-generalisation, even when a stereotype is accurate.

It’s also worth remembering that biases and stereotypes do not just affect individuals – but can have far-reaching implications for both individuals and societies alike.

Two types of harm biases can contribute to are allocative harms (when a societal or economic system unfairly allocates resources to one group over another) and representative harms (when systems detract from or distort the social identity and representation of certain groups).

Generative language models like ChatGPT do not have minds, nor cognitive models, and therefore do not possess mental biases or stereotypes in the same way a human might. However, in learning from human data and generating human-like data, language models can come to possess functional biases and stereotypes. That is, the behaviour of the AI system expresses biases and stereotypes that are reflective of humans who created the training data and/or processes and thus of the societies in which these are embedded. Given the role that generative language models are likely to play in shaping our mediated information environment, the biases they incorporate are an important research topic. In some cases these values are deliberately incorporated into automated systems to reinforce social norms (as in the case of the refusal of ChatGPT to generate hate speech). In other cases, they may be the result of the training data and the construction of the resulting algorithms.

The processes that encode these biases and stereotypes in language models are opaque. For instance, OpenAI released almost no information about how the latest version of ChatGPT was trained or what kinds of value-laden data were used to shape its responses to users. This means that the values ‘encoded’ in this highly influential AI system are effectively controlled by a small group of developers with little or no transparency or oversight.

This is where you come in: over the course of this hackathon, our hope is that teams will be able to work together to begin to map some of the bias and stereotype contours present in some of the recent generative language models.

*Adapted from https://www.psychologytoday.com/au/basics/bias.

METHODOLOGIES

What will teams actually be doing during the hackathon? It’s up to you to decide! We don’t want to limit the types of investigation teams undertake. Anything that considers the issue of bias in generative language models would be useful and in-scope. Some ideas include…

  • Conduct a detailed literature review to see what emerging evidence already exists mapping a particular type of bias in generative language models.
  • Use an existing benchmark, survey, or questionnaire to evaluate and compare several generative language models for a particular type of bias. The tool you use for evaluation could be something from within the machine learning literature, or from outside this domain (e.g. psychology, social science, etc.)
  • Apply a critical lens to an existing evaluation benchmark or survey – where might this tool overlook important nuances or details around a particular bias? How was the tool developed? Are key stakeholders involved in the process (the “Nothing about us without us” principle)?
  • Develop an experiment to test the bias of a particular model on a specific topic or issue. Even quite simple experiments can be revealing. For example, the scholar Safiya Noble embarked on her exploration of algorithmic discrimination by conducting a Google search for the term, ‘Black girls.’ Her book also describes a UN campaign that unearthed sexist and racist biases reflected in the auto-complete function in Google searches (for more information, see the introduction to her book, Algorithms of Oppression). A similar investigation was recently undertaken by ADM+S researcher Thao Phan, uncovering racial bias in the image generation AI DALL-E 2. As soon as large language models received recent media attention, discussions started about the forms of political, racial, and gendered bias they incorporate – and how best to address these. 
  • Insert your idea here!

THE TECH

The generative language model landscape is developing rapidly, with new models, platforms, and apps being released on a weekly basis. Given this fact, the results of this hackathon will be most useful if our efforts are focussed on a few key platforms or models. Below, we’ve suggested some good options, and contrasted some of their key differences.

ChatGPT

chat.openai.com, or on the iOS or Android app stores: A proprietary dialog agent that is amusing and sometimes helpful. The language models underlying ChatGPT are heavily entwined with multiple layers of proprietary fine-tuning, guard-rails and content filters.

  • Tech details: Unknown model size and architecture, with proprietary training data and processes.
  • Access: Requires creating a free OpenAI account to access the GPT-3.5 model. Paid accounts can also access the more advanced GPT-4 model.
  • Data Privacy: By default, all manual user conversations are saved for use by OpenAI. This can be disabled in settings. API interactions are not saved. See this FAQ page for more information.
  • Affordances: Paid accounts can enable internet access and integration with third party services via plugins.
  • User controls: Can bookmark, share, and delete prior conversations. Bulk data export and deletion controls are also available in the settings.
  • Computational access: Via the OpenAI Chat Completions API (requires $), (or via lots many other 3rd party interfaces).

Microsoft’s Bing Chat

bing.com (click on ‘Chat’ tab): A proprietary natural language chat interface for the Bing search engine. Bing chat is generally able to provide accurate citations for information it provides.

  • Tech details: Based on a customised version of one of OpenAI’s GPT models
  • Access: Requires using Microsoft’s Edge browser, and creating a free microsoft account. Will not work if an Edge private browsing session is used, or if Bing’s ‘Safe Search’ setting is set to ‘strict’.
  • Data Privacy: Microsoft saves all conversations for later use as per the terms of use.
  • Affordances: The dialog agent can browse the Internet
  • User controls: Bookmarking or sharing individual conversations is not possible. Can choose from ‘creative’, ‘balanced’ or ‘precise’ conversation styles.
  • Computational access: None

Llama 2

ai.meta.com/llama: A second generation of the Llama language model from Meta. Llama and Llama 2 are semi-open-source, but many details of the model’s construction and training are proprietary. There are both basic LLM and fine-tuned dialog agent versions of LLama and Llama 2 available. The fine-tuned versions have undergone additional training to discourage toxic outputs and limit the topics they will discuss.

  • Tech details: transformer models with either 7, 13, or 70 billion parameters. Training data and processes are largely opaque, but some details are provided in the associated report.
  • Access: The 13b parameter Llama 2 dialog agent can be freely accessed at the LMSYS ChatArena – chat.lmsys.org without a user account (as well as a range of other models).
  • Data privacy: LMSYS ChatArena saves all conversations for research purposes.
  • Affordances: Text input and output only.
  • User controls: LMSYS ChatArena allows users to control generation hyper-parameters, and also lets users compare different models side-by-side.
  • Computational access: The Llama and Llama 2 models can be used programmatically using the HuggingFace libraries, but this requires pre-approval from Meta. See this page for more information.

Other helpful resources

  • Hugging Face has an online database of user-created ‘spaces’ where you can interact with many different models. E.g. the Llama 2 70b chat model can be found here. Many spaces offer free compute (your prompts go into a queue and you have to wait for HuggingFace’s servers to process them and generate the response), while some spaces offer faster interaction because the creator has paid to rent a GPU server from HuggingFace.
  • GPT4All is a desktop app for Windows, Mac and Linux that lets users interact with a range of dialog agents on their own computer. This works even without a GPU, but can be slow. This also requires downloading the model you want to use, which can take a while.

MARKING CRITERIA

Teams will be judged according to four criteria; Presentation; Interestingness; Rigour; and Scope and potential for follow-up. Each criteria will be weighted equally.

Presentation: This criteria addresses how the participants’ presentation provides an informative and interactive story that exposes contemporary issues with generative AI. The participants’ ability to effectively communicate the complexities and implications of their projects is crucial in raising awareness about the challenges and concerns associated with generative AI. A strong presentation should captivate the audience, engaging them with a narrative story that highlights the ethical, social, and technical dimensions of the technology. For example, by incorporating real-world examples and scenarios, the participants can demonstrate the potential risks and benefits of generative AI in a compelling manner. Ultimately, to what extent can participants provoke deep reflection, and consideration in the audience regarding their research.

Interestingness: Projects will be assessed based on their alignment and how they extend upon the ongoing developments in the field, for example, how successfully participants have incorporated cutting-edge techniques, methodologies, and algorithms that reflect the current state of the industry. Moreover, it is essential to consider whether the research findings have practical implications and can be translated into real-world applications. The winning submission should showcase a deep understanding of current trends, demonstrate the ability to bridge the gap between academia and industry, and propose innovative and interesting solutions that can push the boundaries of generative AI research and creatively challenge both experts and lay-audiences.

Rigour: This criteria addresses how the research presented aligns to academic and industry best practices. Examining how the participants have leveraged existing academic literature, industry standards, and established best practices will help gauge the extent to which their projects contribute to the advancement of the field. Additionally, how well processes are documented and evidenced is important in ensuring explainability and replicability of results.

Scope and potential for follow-up: This criteria addresses the overall feasibility of the project, both in terms of what you’ve done during the short time of the hackathon, but importantly how this work can be scaled and continued into the future. In addressing this criteria participants are encouraged to comment on the resources and infrastructures used to complete their work during the hackathon, how their work can be extended and continued moving forward including what additional or continuing resources would be required to do so, and what barriers may present themselves?

EXAMPLE TOPICS

As a team you are free to choose your own area of bias to investigate throughout the hackathon, however we have curated some suggested topics below, along with some helpful resources that might be useful to get you started. Please don’t feel limited to this list though!

Gender bias

Gender bias is a preference or prejudice toward one gender over another. This type of bias is prevalent in modern society (for instance, recall the #GamerGate harassment campaign in 2014, or the more recent Depp v. Heard trial) and has deep seated historical roots, but is particularly relevant when dealing with text-based generative language models due to the interconnectedness of gender and language (even more so in languages that are more strongly gendered than English – e.g. German).,

There has been a lot of work in evaluating, critiquing, and removing gender bias in the area of machine learning for Natural Language Processing (for example, see the 2019 review by Sun et al., or the 2020 paper by Leavy et al.) – and even meta-critiques of gender bias in AI literature. The relevance of these historical investigations to the rapidly emerging domain of generative language models and Generative AI more generally is not yet clear though. Some of the ADM+S projects in this space include the Toxicity Scalpel project – which aims to measure and mitigate sexist tendencies in language models, Lucinda Nelson’s PhD project – which is investigating ‘every day’ misogyny in social media platforms, and Dr. Ariadna Matamors-Fernández’s DECRA project, which investigates racist and sexist humour in online spaces.

Resources

  • StereoSet dataset and benchmark: used to measure stereotype bias in language models across gender, race, religion, and profession. It consists of 17,000 sentences and can be used to measure model race preferences.
  • The Sentence Encoder Attribution Test (SEAT) dataset and benchmark: A collection of gender (and other) stereotyped text samples for testing for gender association biases in language models.
  • The Ambivalent Sexism Inventory: A 22-question survey developed by psychologists Glick and Fisk to map sexist tendencies. See also more recent media coverage of this theory and survey by PBS.
  • Reports from 2015, 2017, and 2018 mapping sexism and workplace gender inclusivity, both internationally and in the Australian context.
  • Guidance document from Western Sydney University for gender inclusive language in surveys and questionnaires.

Political bias

Political bias is a well-observed phenomenon in offline and online social interactions. For instance, our political leaning can impact how we perceive the trustworthiness of online news and how our social media news feeds are curated. Large language models can also exhibit political bias in a range of text-generation tasks. Some approaches for testing political bias include prompting language models with existing political statements or using validated questionnaires such as the political compass test.

Some recent investigations have revealed certain political biases of popular large language models. A pre-print article (i.e., not peer-reviewed) by Hartmann et al. (2023) reports an experiment where they prompted ChatGPT with 630 political statements from two leading voting advice applications and the nation-agnostic political compass test. The results show that ChatGPT is ‘pro-environmental’ and follows what the authors describe as, ‘a left-libertarian ideology.’ Similar political biases towards progressive and libertarian views were also observed in ChatGPT by Rutinowski et al. (2023) (pre-print) when prompted with the political compass test.

Resources

  • The Political Compass – A test that profiles political personalities applicable to all democracies.
  • 2023 Political Quiz – A political test that includes country specific tests presented in different languages.
  • The BOLD dataset and benchmark is a recently published dataset of text prompts for testing for five types of biases in language models, including political bias.
  • Vote Compass – a survey updated and re-released by ABC each federal election. This questionnaire maps political bias across two primary dimensions: Social (from progressive to conservative) and Economic (from left to right leaning).

Indigenous, colonial, & racial biases

Indigenous, racial and colonial biases refer to distinct, yet connected types of prejudice which impact the way individuals are treated or perceived. Racial bias discriminates based on an individual’s race or ethnicity. Indigenous bias is specifically directed at Indigenous people and it stems from negative attitudes towards indigenous cultures, languages, and ways of life. Simultaneously, colonial bias can manifest as the overrepresentation of perspectives, values, and languages of colonial powers, often at the expense of those from colonised or formerly colonised societies, leading to further suppression of already marginalised indigenous perspectives.

Australia’s history in particular is marked by a complex interplay of indigenous cultures and the impacts of colonialism. Unfortunately, this history has also given rise to deep-seated biases and prejudices that continue to affect the country today. At present these issues are being examined in the public square once again due to the up-coming referendum on the indigenous voice to parliament. Other international issues involving these biases include machine learning annotators being more likely to incorrectly classify African American English text as hateful or toxic, as well as discriminatory facial recognition and healthcare algorithms.

The potential for LLMs to generate outputs that reflect racial, indigenous or colonial biases is a preceding concern. For instance, we know that language models are primarily trained on text from Western sources, which means they will inevitably overrepresent Western values and languages, while marginalising indigenous or non-Western ones.

Resources

  • Adult Income Dataset: Also known as the “Adult” dataset and it is extracted from the 1994 Census database. It includes features like age, workclass, education, and race, and it serves the task of predicting whether the income exceeds $50K/yr given the demographics. 
  • StereoSet dataset and benchmark: used to measure stereotype bias in language models across gender, race, religion, and profession. It consists of 17,000 sentences and can be used to measure model race preferences.
  • UN Declaration on the Rights of Indigenous Peoples: This document might serve as a starting point to create some questions for how an LLM understands or respects the mentioned articles.
  • The Racial Bias In Data Assessment Tool developed by Chapin Hall at The University of Chicago helps assess the likelihood of racial bias in a dataset.
  • Harvard University has developed a range of Implicit Association Tests, including several that test for racial biases.
  • A recent survey paper on bias and fairness in machine learning gives several examples of racial bias in real world scenarios, as well as efforts to map and counteract these issues

Disability discrimination

Disability discrimination refers to situations where a person is treated “less favourably, or not given the same opportunities as others in a similar situation because of their disability” (Australian Human Rights Commission 2023). Disability itself has a broad and often contested definition, encompassing varied sensory, physical, neurodiverse, cognitive and psychosocial disabilities. Discrimination against persons with disability can happen through individual prejudice, but also can be written into legislation, hiring practices, medical diagnosis and pathologisation, and normative judgments about capacity and performance. Advocacy groups such as the Disability Rights Movement have sought to combat such discrimination through many channels over the past century. 

Artificial intelligence has the potential to revolutionise accessibility for individuals with disabilities. These tools can assist in various ways, such as providing real-time captioning for those with hearing impairments or offering voice recognition software for individuals with mobility limitations. However, they can also perpetuate biases and discrimination if not properly regulated. It is essential for policymakers, organisations, and developers to proactively address this issue by implementing robust regulations and guidelines standards such as the Web Web Content Accessibility Guidelines (WCAG) improving access to information and decision-making. As much as automation can lead to improvements in vision, hearing, mobility and interpretability, it can also embed discriminatory decision-making behind opaque algorithms and data sets. By ensuring that AI tools are developed with inclusivity in mind, we can mitigate the risk of perpetuating disability discrimination. Organisations should prioritise diversity and inclusion when training AI algorithms. By incorporating diverse datasets that accurately represent individuals with disabilities, we can reduce bias and create more equitable outcomes. The arrival of AI and large language models presents another frontier for the identification and mitigation of disability bias. How do LLMs complete sentences, for example, that contain markers of disability? Do they reproduce known societal biases, or has reinforcement learning taught models to be less discriminatory? Research on earlier models  

Resources

  • Australian Human Rights Commission (2023). Disability discrimination.  
  • Gadiraju, V., Kane, S., Dev, S., Taylor, A., Wang, D., Denton, E., & Brewer, R. (2023, June). “I wouldn’t say offensive but…”: Disability-Centered Perspectives on Large Language Models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 205-216).
  • Magee, L., Ghahremanlou, L., Soldatic, K., & Robertson, S. (2021). Intersectional bias in causal language models. arXiv preprint arXiv:2107.07691
  • Whittaker, M., Alper, M., Bennett, C. L., Hendren, S., Kaziunas, L., Mills, M., … & West, S. M. (2019). Disability, bias, and AI. AI Now Institute, 8.
  • The Ableist Bias Dataset is a recently published collection of prompt templates designed to elicit intersectional biases in generative language models, including ableist biases.
  • Students with a disability by age and gender dataset from South Australian Government Data Directory is a ten-year data of schools students with a disability who are verified by a Department for Education psychologist or speech pathologist as eligible for the Department for Education Disability Support Program. 
  • BITS is a recently published dataset of prompts designed to test for intersectional biases against people with disabilities in toxicity classifier models, but may also be useful for testing generative language models.

Navigate to