Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we obser...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-03-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/14/6/2333 |