wikihistories 2024: Wikipedia and/as Data
June 19 @ 9:00 am - 5:00 pm AESTFree
What is Wikipedia’s relationship to data? What should Wikipedia’s relationship to data be?
The 2024 wikihistories symposium is co-located with ICA Gold Coast and brought to you by the wikihistories project at the University of Technology Sydney in partnership with the Centre for Media Transition, the ARC Centre of Excellence in Automated Decision-Making and Society (ADM+S)
and Wikimedia Australia.
Call for Papers due 15 February
Wikipedia has always been a critical source of data for computer science projects, offering data scientists a massive store of open data. Researchers and developers use Wikipedia to work on natural language processing (NLP) tasks and applications, model user interactions with content and other users, deliver factual statements to users in automated question-answering tasks, and find nearby features as represented by Wikipedia articles (Iliadis, 2022; Iliadis & Ford, 2023).
These practitioners use Wikipedia as a store of facts assuming that it expresses an established consensus as a result of its policies and processes. Yet, Wikipedia’s natural language could contain meanings that resist translation into data and whose classifications might be open to interpretation and critique (Ford & Iliadis, 2023). For example, articles about complex topics such as Jerusalem do not easily align with standard ways of representing entities like cities. Jerusalem’s infobox reflects Wikipedia’s power to make important decisions about how we understand facts and the meanings that are associated with them (Ford & Graham, 2016). This power is intensified when entire Wikipedia articles are translated into structured datafied knowledge bases of machine-readable statements – by the Wikidata project, for example, which started in 2012 as a project of the Wikimedia Foundation (Ford, 2020).
How researchers measure Wikipedia’s sociocultural biases also depends on the datafication of Wikipedia’s content and how such processes may be questioned rather than taken for granted. Measuring the extent to which Wikipedia represents Australians, for example, could simply be achieved by counting articles that are categorised in the “Australians” data category, and yet this category itself is not an objective representation of Australianness but rather the result of particular practices that resist stable referents (Falk et al., 2023). As Wikipedia’s content is increasingly used to power virtual assistants such as Amazon Alexa and more recently large language model applications like ChatGPT and Google’s Bard, Wikipedia participates in the global information ecosystem in ways that go well beyond its role as a web-based encyclopaedia (McDowell & Vetter, 2023). Thus, it is important to understand Wikipedia’s relationship to data, not as a given, but as something to be critically investigated.
This symposium will gather together social scientists, humanists, critical technologists, and others to investigate Wikipedia’s connection to data and the importance of this relationship for the global information ecosystem and the production of knowledge. The workshop will be organised as a day-long, face-to-face event prior to the annual International Communication Association conference on the Gold Coast in Australia.
Participants will be invited to share short presentations and to participate in discussions focused on the questions “What is Wikipedia’s relationship to data?” and/or “What should Wikipedia’s relationship to data be?” Participants will also agree to read a few background papers prior to the gathering. The workshop will result in a collaborative document that maps out possible areas for researching these questions from a sociotechnical lens and the option to continue the collaboration post-symposium.
To participate, please complete the following web form, including a 250-300 word abstract outlining your contribution to the symposium themes.
Lead curator and contact: Heather Ford