Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and d...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-05-01
|
Series: | Heliyon |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2405844023028773 |
_version_ | 1797815672754929664 |
---|---|
author | Pinar Savci Bihter Das |
author_facet | Pinar Savci Bihter Das |
author_sort | Pinar Savci |
collection | DOAJ |
description | Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning. |
first_indexed | 2024-03-13T08:26:16Z |
format | Article |
id | doaj.art-da2d411722014d2398dd190138a5bc20 |
institution | Directory Open Access Journal |
issn | 2405-8440 |
language | English |
last_indexed | 2024-03-13T08:26:16Z |
publishDate | 2023-05-01 |
publisher | Elsevier |
record_format | Article |
series | Heliyon |
spelling | doaj.art-da2d411722014d2398dd190138a5bc202023-05-31T04:45:06ZengElsevierHeliyon2405-84402023-05-0195e15670Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoMLPinar Savci0Bihter Das1Arçelik A.Ş. Karaağaç Caddesi 2-6, Sütlüce Beyoğlu 34445 Istanbul, TurkeyDepartment of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey; Corresponding author.Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning.http://www.sciencedirect.com/science/article/pii/S2405844023028773Natural language processingAutotrainMulti-text classificationPre-trained language modelsArtificial intelligenceAutoNLP |
spellingShingle | Pinar Savci Bihter Das Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML Heliyon Natural language processing Autotrain Multi-text classification Pre-trained language models Artificial intelligence AutoNLP |
title | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_full | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_fullStr | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_full_unstemmed | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_short | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_sort | comparison of pre trained language models in terms of carbon emissions time and accuracy in multi label text classification using automl |
topic | Natural language processing Autotrain Multi-text classification Pre-trained language models Artificial intelligence AutoNLP |
url | http://www.sciencedirect.com/science/article/pii/S2405844023028773 |
work_keys_str_mv | AT pinarsavci comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml AT bihterdas comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml |