Parallel Bidirectionally Pretrained Taggers as Feature Generators

In a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation...

Full description

Bibliographic Details
Main Authors: Ranka Stanković, Mihailo Škorić, Branislava Šandrih Todorović
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/10/5028
_version_ 1797501801559228416
author Ranka Stanković
Mihailo Škorić
Branislava Šandrih Todorović
author_facet Ranka Stanković
Mihailo Škorić
Branislava Šandrih Todorović
author_sort Ranka Stanković
collection DOAJ
description In a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation systems as feature generators for a stacked classifier. It also explores automatic resource expansion via dataset augmentation and bidirectional training in order to increase the number of taggers and to maximize the impact of the composite system, which is especially viable for low-resource languages. We demonstrate the approach on a preannotated dataset for Serbian using nested cross-validation to test and compare standalone and composite taggers. Based on the results, we conclude that given a limited training dataset, there is a payoff from cutting a percentage of the initial training set and using it to fine-tune a machine-learning-based stacked classifier, especially if it is trained bidirectionally. Moreover, we found a measurable impact on the usage of multiple tagsets to scale-up the architecture further through transfer learning methods.
first_indexed 2024-03-10T03:23:47Z
format Article
id doaj.art-cf29c6059342414abd8d8300c3a634cc
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T03:23:47Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-cf29c6059342414abd8d8300c3a634cc2023-11-23T09:56:40ZengMDPI AGApplied Sciences2076-34172022-05-011210502810.3390/app12105028Parallel Bidirectionally Pretrained Taggers as Feature GeneratorsRanka Stanković0Mihailo Škorić1Branislava Šandrih Todorović2Faculty of Mining and Geology, University of Belgrade, Djusina 7, 11120 Belgrade, SerbiaFaculty of Mining and Geology, University of Belgrade, Djusina 7, 11120 Belgrade, SerbiaFaculty of Philology, University of Belgrade, Studentski Trg 3, 11000 Belgrade, SerbiaIn a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation systems as feature generators for a stacked classifier. It also explores automatic resource expansion via dataset augmentation and bidirectional training in order to increase the number of taggers and to maximize the impact of the composite system, which is especially viable for low-resource languages. We demonstrate the approach on a preannotated dataset for Serbian using nested cross-validation to test and compare standalone and composite taggers. Based on the results, we conclude that given a limited training dataset, there is a payoff from cutting a percentage of the initial training set and using it to fine-tune a machine-learning-based stacked classifier, especially if it is trained bidirectionally. Moreover, we found a measurable impact on the usage of multiple tagsets to scale-up the architecture further through transfer learning methods.https://www.mdpi.com/2076-3417/12/10/5028annotationnatural language processingfeature extractioncomposite structurespart of speech
spellingShingle Ranka Stanković
Mihailo Škorić
Branislava Šandrih Todorović
Parallel Bidirectionally Pretrained Taggers as Feature Generators
Applied Sciences
annotation
natural language processing
feature extraction
composite structures
part of speech
title Parallel Bidirectionally Pretrained Taggers as Feature Generators
title_full Parallel Bidirectionally Pretrained Taggers as Feature Generators
title_fullStr Parallel Bidirectionally Pretrained Taggers as Feature Generators
title_full_unstemmed Parallel Bidirectionally Pretrained Taggers as Feature Generators
title_short Parallel Bidirectionally Pretrained Taggers as Feature Generators
title_sort parallel bidirectionally pretrained taggers as feature generators
topic annotation
natural language processing
feature extraction
composite structures
part of speech
url https://www.mdpi.com/2076-3417/12/10/5028
work_keys_str_mv AT rankastankovic parallelbidirectionallypretrainedtaggersasfeaturegenerators
AT mihailoskoric parallelbidirectionallypretrainedtaggersasfeaturegenerators
AT branislavasandrihtodorovic parallelbidirectionallypretrainedtaggersasfeaturegenerators