Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML

Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and d...

Full description

Bibliographic Details
Main Authors: Pinar Savci, Bihter Das
Format: Article
Language:English
Published: Elsevier 2023-05-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844023028773
_version_ 1797815672754929664
author Pinar Savci
Bihter Das
author_facet Pinar Savci
Bihter Das
author_sort Pinar Savci
collection DOAJ
description Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning.
first_indexed 2024-03-13T08:26:16Z
format Article
id doaj.art-da2d411722014d2398dd190138a5bc20
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-03-13T08:26:16Z
publishDate 2023-05-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-da2d411722014d2398dd190138a5bc202023-05-31T04:45:06ZengElsevierHeliyon2405-84402023-05-0195e15670Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoMLPinar Savci0Bihter Das1Arçelik A.Ş. Karaağaç Caddesi 2-6, Sütlüce Beyoğlu 34445 Istanbul, TurkeyDepartment of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey; Corresponding author.Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning.http://www.sciencedirect.com/science/article/pii/S2405844023028773Natural language processingAutotrainMulti-text classificationPre-trained language modelsArtificial intelligenceAutoNLP
spellingShingle Pinar Savci
Bihter Das
Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
Heliyon
Natural language processing
Autotrain
Multi-text classification
Pre-trained language models
Artificial intelligence
AutoNLP
title Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_full Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_fullStr Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_full_unstemmed Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_short Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_sort comparison of pre trained language models in terms of carbon emissions time and accuracy in multi label text classification using automl
topic Natural language processing
Autotrain
Multi-text classification
Pre-trained language models
Artificial intelligence
AutoNLP
url http://www.sciencedirect.com/science/article/pii/S2405844023028773
work_keys_str_mv AT pinarsavci comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml
AT bihterdas comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml