The Toxicity Scalpel: Prototyping and evaluating methods to remove harmful generative capability from foundation models
Focus Areas: News and Media
Research Programs: Machines
AI language models have made significant strides over the past few years. Computers are now capable of writing poetry and computer code, producing human-like text, summarising documents, engaging in natural conversation about a variety of topics, solving math problems, and translating between languages.
This rapid progress has been made possible by a trend in AI development where one general ‘foundational’ model is developed (usually using a large dataset from the internet) and then adapted many times to fit diverse applications, rather than beginning from scratch each time.
This method of ADM development can appear time and cost effective, but ‘bakes in’ negative tendencies like the creation of toxic content, misogyny, or hate speech at the foundational layer, which subsequently spread to each downstream application.
The goal of this project is to examine how language models used in ADM systems might be improved by making modifications at the foundation model stage, rather than at the application level, where computational interventions, social responsibility, and legal liability have historically focussed.
[This project description was generated by summarising parts of the project proposal document using a language model AI].