Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts

The process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, sp...

Full description

Bibliographic Details
Main Authors: Irina Kotiurova, Polina Trenina
Format: Article
Language:English
Published: FRUCT 2022-04-01
Series:Proceedings of the XXth Conference of Open Innovations Association FRUCT
Subjects:
Online Access:https://www.fruct.org/publications/fruct31/files/Kot.pdf
_version_ 1818005992687796224
author Irina Kotiurova
Polina Trenina
author_facet Irina Kotiurova
Polina Trenina
author_sort Irina Kotiurova
collection DOAJ
description The process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, spaCy, TextBlob, RFTagger and TreeTagger, based on the texts from the annotated learner corpus of Petrozavodsk State University (PACT, Petrozavodsk Annotated Corpus of Texts). All these tools are applicable to the German language; however, learner texts have their own characteristics, primarily associated with a large number of errors. The problem of scientific research is in finding out how traditional instruments will cope with the task of automatic annotation when they come across a large number of grammatical, lexical and spelling mistakes. The conclusions were drawn about the frequency of errors in the part-of-speech identification, the tagging quality, weaknesses and strengths of each tagger.
first_indexed 2024-04-14T04:54:23Z
format Article
id doaj.art-e62b35d32f854ff98835032b2f3da2f5
institution Directory Open Access Journal
issn 2305-7254
2343-0737
language English
last_indexed 2024-04-14T04:54:23Z
publishDate 2022-04-01
publisher FRUCT
record_format Article
series Proceedings of the XXth Conference of Open Innovations Association FRUCT
spelling doaj.art-e62b35d32f854ff98835032b2f3da2f52022-12-22T02:11:11ZengFRUCTProceedings of the XXth Conference of Open Innovations Association FRUCT2305-72542343-07372022-04-0131111512410.23919/FRUCT54823.2022.9770886Comparative Analysis of Automatic POS Taggers Applied to German Learner TextsIrina Kotiurova0Polina Trenina1Petrozavodsk State University, RussiaPetrosavodsk State University, RussiaThe process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, spaCy, TextBlob, RFTagger and TreeTagger, based on the texts from the annotated learner corpus of Petrozavodsk State University (PACT, Petrozavodsk Annotated Corpus of Texts). All these tools are applicable to the German language; however, learner texts have their own characteristics, primarily associated with a large number of errors. The problem of scientific research is in finding out how traditional instruments will cope with the task of automatic annotation when they come across a large number of grammatical, lexical and spelling mistakes. The conclusions were drawn about the frequency of errors in the part-of-speech identification, the tagging quality, weaknesses and strengths of each tagger.https://www.fruct.org/publications/fruct31/files/Kot.pdflearner corpuspart-of-speech taggergermanpos-tagging
spellingShingle Irina Kotiurova
Polina Trenina
Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
Proceedings of the XXth Conference of Open Innovations Association FRUCT
learner corpus
part-of-speech tagger
german
pos-tagging
title Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
title_full Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
title_fullStr Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
title_full_unstemmed Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
title_short Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
title_sort comparative analysis of automatic pos taggers applied to german learner texts
topic learner corpus
part-of-speech tagger
german
pos-tagging
url https://www.fruct.org/publications/fruct31/files/Kot.pdf
work_keys_str_mv AT irinakotiurova comparativeanalysisofautomaticpostaggersappliedtogermanlearnertexts
AT polinatrenina comparativeanalysisofautomaticpostaggersappliedtogermanlearnertexts