Parallel Bidirectionally Pretrained Taggers as Feature Generators
In a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/10/5028 |
_version_ | 1797501801559228416 |
---|---|
author | Ranka Stanković Mihailo Škorić Branislava Šandrih Todorović |
author_facet | Ranka Stanković Mihailo Škorić Branislava Šandrih Todorović |
author_sort | Ranka Stanković |
collection | DOAJ |
description | In a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation systems as feature generators for a stacked classifier. It also explores automatic resource expansion via dataset augmentation and bidirectional training in order to increase the number of taggers and to maximize the impact of the composite system, which is especially viable for low-resource languages. We demonstrate the approach on a preannotated dataset for Serbian using nested cross-validation to test and compare standalone and composite taggers. Based on the results, we conclude that given a limited training dataset, there is a payoff from cutting a percentage of the initial training set and using it to fine-tune a machine-learning-based stacked classifier, especially if it is trained bidirectionally. Moreover, we found a measurable impact on the usage of multiple tagsets to scale-up the architecture further through transfer learning methods. |
first_indexed | 2024-03-10T03:23:47Z |
format | Article |
id | doaj.art-cf29c6059342414abd8d8300c3a634cc |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T03:23:47Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-cf29c6059342414abd8d8300c3a634cc2023-11-23T09:56:40ZengMDPI AGApplied Sciences2076-34172022-05-011210502810.3390/app12105028Parallel Bidirectionally Pretrained Taggers as Feature GeneratorsRanka Stanković0Mihailo Škorić1Branislava Šandrih Todorović2Faculty of Mining and Geology, University of Belgrade, Djusina 7, 11120 Belgrade, SerbiaFaculty of Mining and Geology, University of Belgrade, Djusina 7, 11120 Belgrade, SerbiaFaculty of Philology, University of Belgrade, Studentski Trg 3, 11000 Belgrade, SerbiaIn a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation systems as feature generators for a stacked classifier. It also explores automatic resource expansion via dataset augmentation and bidirectional training in order to increase the number of taggers and to maximize the impact of the composite system, which is especially viable for low-resource languages. We demonstrate the approach on a preannotated dataset for Serbian using nested cross-validation to test and compare standalone and composite taggers. Based on the results, we conclude that given a limited training dataset, there is a payoff from cutting a percentage of the initial training set and using it to fine-tune a machine-learning-based stacked classifier, especially if it is trained bidirectionally. Moreover, we found a measurable impact on the usage of multiple tagsets to scale-up the architecture further through transfer learning methods.https://www.mdpi.com/2076-3417/12/10/5028annotationnatural language processingfeature extractioncomposite structurespart of speech |
spellingShingle | Ranka Stanković Mihailo Škorić Branislava Šandrih Todorović Parallel Bidirectionally Pretrained Taggers as Feature Generators Applied Sciences annotation natural language processing feature extraction composite structures part of speech |
title | Parallel Bidirectionally Pretrained Taggers as Feature Generators |
title_full | Parallel Bidirectionally Pretrained Taggers as Feature Generators |
title_fullStr | Parallel Bidirectionally Pretrained Taggers as Feature Generators |
title_full_unstemmed | Parallel Bidirectionally Pretrained Taggers as Feature Generators |
title_short | Parallel Bidirectionally Pretrained Taggers as Feature Generators |
title_sort | parallel bidirectionally pretrained taggers as feature generators |
topic | annotation natural language processing feature extraction composite structures part of speech |
url | https://www.mdpi.com/2076-3417/12/10/5028 |
work_keys_str_mv | AT rankastankovic parallelbidirectionallypretrainedtaggersasfeaturegenerators AT mihailoskoric parallelbidirectionallypretrainedtaggersasfeaturegenerators AT branislavasandrihtodorovic parallelbidirectionallypretrainedtaggersasfeaturegenerators |