BTSD: A curated transformation of sentence dataset for text classification in Bangla language

The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentence...

Full description

Bibliographic Details
Main Authors: Rajesh Kumar Das, Mirajul Islam, Sharun Akter Khushbu
Format: Article
Language:English
Published: Elsevier 2023-10-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340923005450
_version_ 1797660434228051968
author Rajesh Kumar Das
Mirajul Islam
Sharun Akter Khushbu
author_facet Rajesh Kumar Das
Mirajul Islam
Sharun Akter Khushbu
author_sort Rajesh Kumar Das
collection DOAJ
description The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains.
first_indexed 2024-03-11T18:30:46Z
format Article
id doaj.art-3d68f063c5674e3290dc73d72d14d6df
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-03-11T18:30:46Z
publishDate 2023-10-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-3d68f063c5674e3290dc73d72d14d6df2023-10-13T11:04:41ZengElsevierData in Brief2352-34092023-10-0150109445BTSD: A curated transformation of sentence dataset for text classification in Bangla languageRajesh Kumar Das0Mirajul Islam1Sharun Akter Khushbu2Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshCorresponding author.; Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshDepartment of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshThe Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains.http://www.sciencedirect.com/science/article/pii/S2352340923005450Natural language processingMachine learningText classificationTransformation of sentenceBangla language
spellingShingle Rajesh Kumar Das
Mirajul Islam
Sharun Akter Khushbu
BTSD: A curated transformation of sentence dataset for text classification in Bangla language
Data in Brief
Natural language processing
Machine learning
Text classification
Transformation of sentence
Bangla language
title BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_full BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_fullStr BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_full_unstemmed BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_short BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_sort btsd a curated transformation of sentence dataset for text classification in bangla language
topic Natural language processing
Machine learning
Text classification
Transformation of sentence
Bangla language
url http://www.sciencedirect.com/science/article/pii/S2352340923005450
work_keys_str_mv AT rajeshkumardas btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage
AT mirajulislam btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage
AT sharunakterkhushbu btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage