Contextual object detection with multimodal large language models

Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-language tasks, such as image captioning and question answering, but lack the essential perception ability, i.e., object detection. In this work, we address this limitation by introducing a novel research problem of contextual...

Full description

Bibliographic Details
Main Authors:	Zang, Yuhang, Li, Wei, Han, Jun, Zhou, Kaiyang, Loy, Chen Change
Other Authors:	College of Computing and Data Science
Format:	Journal Article
Language:	English
Published:	2024
Subjects:	Computer and Information Science Image segmentation Object detection
Online Access:	https://hdl.handle.net/10356/181063

Internet

https://hdl.handle.net/10356/181063

Contextual object detection with multimodal large language models

Internet

Similar Items