A Levenshtein distance-based method for word segmentation in corpus augmentation of geoscience texts
ABSTRACTFor geoscience text, rich domain corpora have become the basis of improving the model performance in word segmentation. However, the lack of domain-specific corpus with annotation labelled has become a major obstacle to professional information mining in geoscience fields. In this paper, we...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2023-04-01
|
Series: | Annals of GIS |
Subjects: | |
Online Access: | https://www.tandfonline.com/doi/10.1080/19475683.2023.2165543 |