- This event has passed.
Searching Large Collections of Paper – Research Seminar by Doug Oard
January 23 @ 12:00 pm AEDT
This event was postponed due to illness. The rescheduled talk took place on 31 January at RMIT University, and online.
Information retrieval has for decades focused on finding digital documents, including documents that were born digital and those that have been digitised. But there are also enormous collections of physical documents, on paper or microfilm for example, that are not likely to be fully digitised in our lifetimes.
The U.S. National Archives and Records Administration (NARA) presently holds 11.7 billion pages, only about 2% of which is presently either in digital or digitised form. This is just one among many thousands of archival repositories; with more than 25,000 such repositories in just the United States.
Access to the culturally important materials that these repositories curate is presently mediated largely through high-level descriptions of entire collections that have been written by archivists, along with detailed descriptions of how some of those collections are organised.
In this talk, Professor Doug Oard describes a project in which he seeks to build on that descriptive work, both by leveraging the limited amount of digitisation that has been performed and by assembling descriptions of archival content from published materials such as journal articles or books.
This is joint work with David Doermann, Emi Ishita, Katrina Fenlon, Diana Marsh, Tokinori Suzuki and Yoichi Tomiura.
Doug Oard is a Professor at the University of Maryland, with joint appointments in the College of Information Studies (the iSchool) and the University of Maryland Institute for Advanced Computer Studies (UMIACS). He is perhaps best known for his research on Cross-Language Information Retrieval (CLIR), but more generally one thread of his research has addressed the use of technologies such as machine translation, speech recognition, document image analysis, knowledge representation, processing mathematical notation, and social network analysis to support information access. He also has interests in applications of information retrieval in specific settings, including archival access and the “discovery” process for exchanging evidence among parties to civil litigation.
Among his current projects are leveraging multiple sources of evidence to help people find content in archives that has not yet been described at item level or digitized, and detecting inference risks when reviewing previously restricted materials for declassification.