A Levenshtein distance-based method for word segmentation in corpus augmentation of geoscience texts

ABSTRACTFor geoscience text, rich domain corpora have become the basis of improving the model performance in word segmentation. However, the lack of domain-specific corpus with annotation labelled has become a major obstacle to professional information mining in geoscience fields. In this paper, we...

Full description

Bibliographic Details
Main Authors: Jinqu Zhang, Lang Qian, Shu Wang, Yunqiang Zhu, Zhenji Gao, Hailong Yu, Weirong Li
Format: Article
Language:English
Published: Taylor & Francis Group 2023-04-01
Series:Annals of GIS
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/19475683.2023.2165543