Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
The process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, sp...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
FRUCT
2022-04-01
|
Series: | Proceedings of the XXth Conference of Open Innovations Association FRUCT |
Subjects: | |
Online Access: | https://www.fruct.org/publications/fruct31/files/Kot.pdf |
_version_ | 1818005992687796224 |
---|---|
author | Irina Kotiurova Polina Trenina |
author_facet | Irina Kotiurova Polina Trenina |
author_sort | Irina Kotiurova |
collection | DOAJ |
description | The process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, spaCy, TextBlob, RFTagger and TreeTagger, based on the texts from the annotated learner corpus of Petrozavodsk State University (PACT, Petrozavodsk Annotated Corpus of Texts). All these tools are applicable to the German language; however, learner texts have their own characteristics, primarily associated with a large number of errors. The problem of scientific research is in finding out how traditional instruments will cope with the task of automatic annotation when they come across a large number of grammatical, lexical and spelling mistakes. The conclusions were drawn about the frequency of errors in the part-of-speech identification, the tagging quality, weaknesses and strengths of each tagger. |
first_indexed | 2024-04-14T04:54:23Z |
format | Article |
id | doaj.art-e62b35d32f854ff98835032b2f3da2f5 |
institution | Directory Open Access Journal |
issn | 2305-7254 2343-0737 |
language | English |
last_indexed | 2024-04-14T04:54:23Z |
publishDate | 2022-04-01 |
publisher | FRUCT |
record_format | Article |
series | Proceedings of the XXth Conference of Open Innovations Association FRUCT |
spelling | doaj.art-e62b35d32f854ff98835032b2f3da2f52022-12-22T02:11:11ZengFRUCTProceedings of the XXth Conference of Open Innovations Association FRUCT2305-72542343-07372022-04-0131111512410.23919/FRUCT54823.2022.9770886Comparative Analysis of Automatic POS Taggers Applied to German Learner TextsIrina Kotiurova0Polina Trenina1Petrozavodsk State University, RussiaPetrosavodsk State University, RussiaThe process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, spaCy, TextBlob, RFTagger and TreeTagger, based on the texts from the annotated learner corpus of Petrozavodsk State University (PACT, Petrozavodsk Annotated Corpus of Texts). All these tools are applicable to the German language; however, learner texts have their own characteristics, primarily associated with a large number of errors. The problem of scientific research is in finding out how traditional instruments will cope with the task of automatic annotation when they come across a large number of grammatical, lexical and spelling mistakes. The conclusions were drawn about the frequency of errors in the part-of-speech identification, the tagging quality, weaknesses and strengths of each tagger.https://www.fruct.org/publications/fruct31/files/Kot.pdflearner corpuspart-of-speech taggergermanpos-tagging |
spellingShingle | Irina Kotiurova Polina Trenina Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts Proceedings of the XXth Conference of Open Innovations Association FRUCT learner corpus part-of-speech tagger german pos-tagging |
title | Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts |
title_full | Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts |
title_fullStr | Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts |
title_full_unstemmed | Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts |
title_short | Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts |
title_sort | comparative analysis of automatic pos taggers applied to german learner texts |
topic | learner corpus part-of-speech tagger german pos-tagging |
url | https://www.fruct.org/publications/fruct31/files/Kot.pdf |
work_keys_str_mv | AT irinakotiurova comparativeanalysisofautomaticpostaggersappliedtogermanlearnertexts AT polinatrenina comparativeanalysisofautomaticpostaggersappliedtogermanlearnertexts |