ACRF: Aggregated Conditional Random Field for Out of Vocab (OOV) Token Representation for Hindi NER

Named entities are random, like emerging entities and complex entities. Most of the large language model’s tokenizers have fixed vocab; hence, they tokenize out-of-vocab (OOV) words into multiple sub-words during tokenization. During fine-tuning for any downstream task, these sub-words (t...

Full description

Bibliographic Details
Main Authors: Sumit Singh, Uma Shanker Tiwary
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10422739/