Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin
Tokenization of modern and old Western European languages seems to be fairly simple, as it stands on the presence mostly of markers such as spaces and punctuation. However, when dealing with old sources like manuscripts written in scripta continua, antiquity epigraphy or Middle Age manuscripts, (1)...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Nicolas Turenne
2020-04-01
|
Series: | Journal of Data Mining and Digital Humanities |
Subjects: | |
Online Access: | https://jdmdh.episciences.org/5581/pdf |