Logos: KDD2025 and Meta CRAG-MM Challenge. Image: Illustration of Blue glasses with text prompt inside.

Award-winning tech enhances information retrieval in wearable AI devices

Author ADM+S Centre
Date 6 August 2025

A team of researchers from the ARC Centre of Excellence for Automated Decision-Making and Society (ADM+S) at UNSW has been awarded third place for single-source augmentation in the highly competitive KDD Cup 2025 Meta CRAG-MM Challenge, ranking alongside top institutions such as Peking University, Meituan and NVIDIA.

The CRAG-MM (Comprehensive RAG Benchmark for Multi-modal, Multi-turn) is a pioneering benchmark designed to evaluate next-generation visual assistants powered by Vision Large Language Models (VLLMs), combining image understanding, information retrieval, and dialogue generation.

The Challenge sought to improve how users interact with AI assistants through wearable devices like smart glasses, which capture first-person (egocentric) images to support more intuitive, visual-based search and communication.

Participating teams tackled three tasks focused on advancing these systems: answering user questions, integrating information from multiple sources, and generating seamless multi-turn conversations.

Led by ADM+S Chief Investigator Prof Flora Salim, the team included Breeze Chen and Wilson Wongso (ADM+S students at UNSW), and other members of Prof Salim’s team  Xiaoqian Hu and Yue Tan from UNSW. 

The challenge focused on building agents that are factually accurate and robust, as current VLLMs tend to hallucinate and provide unreliable answers. The team’s solution was designed specifically to address this weakness by balancing accuracy and truthfulness, both of which were key to the final rankings.

“This challenge really pushes the boundaries of next-generation AI assistants. It highlights the importance of building agents that are not just capable, but critically reliable and accurate,” said Wilson Wongso.

The team’s technical approach combined document retrieval, dual-path answer generation, and multi-stage verification. For each question, it searched a database for relevant information, guided the language model to base its responses on that data, and then re-verified the output to ensure factual accuracy. This significantly reduced hallucinated content while preserving answer quality, an issue that continues to challenge current VLLMs.

“We weren’t expecting to place third, as our solution was only ranked within the top 10 on the preliminary leaderboard. But it proved to be quite robust during the manual evaluations by the judges, ultimately securing us 3rd place.”

Over the past few months, the challenge brought together over 900 participants forming more than 250 teams from around the world, submitting more than 5000 entries across three tasks. 

The Meta CRAG-MM Challenge is part of the KDD 2025 (Knowledge Discovery and Data Mining), a premier international conference in data science and machine learning that brings together researchers and practitioners in data mining, data science, artificial intelligence, and large-scale analytics.

Read more details about the team’s technical solution in the pre-print publication Multi-Stage Verification-Centric Framework for Mitigating Hallucination in Multi-Modal RAG

Prof Flora Salim and Wilson Wongso are also contributing to GenAISim: Simulation in the Loop for Multi-Stakeholder Interactions with Generative Agents, an ADM+S signature project developing and testing a novel suite of generative and data driven simulations to support complex decision-making across sectors.

SEE ALSO