Hallucinations in Multimodal Large Language Models: Evaluation, Mitigation, and Future Directions

Loading...
Thumbnail Image

Files

Publication or External Link

Date

Advisor

Shrivastava, Abhinav
Yacoob, Yaser

Citation

Abstract

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a wide array of tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies.

This thesis makes four key contributions to the study of hallucinations in MLLMs. First, we provide a clear definition and taxonomy of hallucinations. Second, we propose a systematic evaluation framework that quantifies hallucinations across different modalities and task settings, employing a suite of metrics specifically designed to capture real-world failure modes. Third, we introduce a set of novel mitigation strategies that integrate architectural enhancements, finetuning with targeted objectives, and data augmentation. These approaches collectively reduce hallucination rates while maintaining the model’s generalization ability. Finally, we conduct an in-depth analysis to uncover the underlying causes of hallucination.

By consolidating evaluation, diagnosis, and mitigation into a unified investigation, this thesis advances the understanding of hallucinations in MLLMs and offers actionable guidance for building more reliable and trustworthy multimodal AI systems in both the architecture and data perspectives. Our findings provide a foundation for future research and practical deployment in the multimodal learning domain.

Notes

Rights