Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML

Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and d...

Full description

Bibliographic Details
Main Authors:	Pinar Savci, Bihter Das
Format:	Article
Language:	English
Published:	Elsevier 2023-05-01
Series:	Heliyon
Subjects:	Natural language processing Autotrain Multi-text classification Pre-trained language models Artificial intelligence AutoNLP
Online Access:	http://www.sciencedirect.com/science/article/pii/S2405844023028773

_version_	1827937481894395904
author	Pinar Savci Bihter Das
author_facet	Pinar Savci Bihter Das
author_sort	Pinar Savci
collection	DOAJ
description	Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning.
first_indexed	2024-03-13T08:26:16Z
format	Article
id	doaj.art-da2d411722014d2398dd190138a5bc20
institution	Directory Open Access Journal
issn	2405-8440
language	English
last_indexed	2024-03-13T08:26:16Z
publishDate	2023-05-01
publisher	Elsevier
record_format	Article
series	Heliyon
spelling	doaj.art-da2d411722014d2398dd190138a5bc202023-05-31T04:45:06ZengElsevierHeliyon2405-84402023-05-0195e15670Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoMLPinar Savci0Bihter Das1Arçelik A.Ş. Karaağaç Caddesi 2-6, Sütlüce Beyoğlu 34445 Istanbul, TurkeyDepartment of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey; Corresponding author.Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning.http://www.sciencedirect.com/science/article/pii/S2405844023028773Natural language processingAutotrainMulti-text classificationPre-trained language modelsArtificial intelligenceAutoNLP
spellingShingle	Pinar Savci Bihter Das Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML Heliyon Natural language processing Autotrain Multi-text classification Pre-trained language models Artificial intelligence AutoNLP
title	Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_full	Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_fullStr	Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_full_unstemmed	Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_short	Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_sort	comparison of pre trained language models in terms of carbon emissions time and accuracy in multi label text classification using automl
topic	Natural language processing Autotrain Multi-text classification Pre-trained language models Artificial intelligence AutoNLP
url	http://www.sciencedirect.com/science/article/pii/S2405844023028773
work_keys_str_mv	AT pinarsavci comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml AT bihterdas comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml

Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML

Similar Items