ACRF: Aggregated Conditional Random Field for Out of Vocab (OOV) Token Representation for Hindi NER
Named entities are random, like emerging entities and complex entities. Most of the large language model’s tokenizers have fixed vocab; hence, they tokenize out-of-vocab (OOV) words into multiple sub-words during tokenization. During fine-tuning for any downstream task, these sub-words (t...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10422739/ |