PROJECT SUMMARY

Person with colourful text overlay

The Toxicity Scalpel: Prototyping and evaluating methods to remove harmful generative capability from foundation models

Focus Areas: News and Media
Research Programs: Machines
Status: Active

AI language models have made significant strides over the past few years. Computers are now capable of writing poetry and computer code, producing human-like text, summarising documents, engaging in natural conversation about a variety of topics, solving math problems, and translating between languages.

This rapid progress has been made possible by a trend in AI development where one general ‘foundational’ model is developed (usually using a large dataset from the internet) and then adapted many times to fit diverse applications, rather than beginning from scratch each time.

This method of ADM development can appear time and cost effective, but ‘bakes in’ negative tendencies like the creation of toxic content, misogyny, or hate speech at the foundational layer, which subsequently spread to each downstream application.

The goal of this project is to examine how language models used in ADM systems might be improved by making modifications at the foundation model stage, rather than at the application level, where computational interventions, social responsibility, and legal liability have historically focussed.

[This project description was generated by summarising parts of the project proposal document using a language model AI].

RESEARCHERS

ADM+S Investigator Flora Salim

Prof Flora Salim

Chief Investigator,
UNSW

Learn more

ADM+S Chief Investigator Nic Suzor

Prof Nic Suzor

Chief Investigator,
QUT

Learn more

Hao Xue

Dr Hao Xue

Associate Investigator,
UNSW

Learn more

Dr Aaron Snoswell

Dr Aaron Snoswell

Research Fellow,
QUT

Learn more

Lucinda Nelson

Lucinda Nelson

PhD Student,
QUT

Learn more