BTSD: A curated transformation of sentence dataset for text classification in Bangla language

The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentence...

Full description

Bibliographic Details
Main Authors:	Rajesh Kumar Das, Mirajul Islam, Sharun Akter Khushbu
Format:	Article
Language:	English
Published:	Elsevier 2023-10-01
Series:	Data in Brief
Subjects:	Natural language processing Machine learning Text classification Transformation of sentence Bangla language
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340923005450

_version_	1797660434228051968
author	Rajesh Kumar Das Mirajul Islam Sharun Akter Khushbu
author_facet	Rajesh Kumar Das Mirajul Islam Sharun Akter Khushbu
author_sort	Rajesh Kumar Das
collection	DOAJ
description	The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains.
first_indexed	2024-03-11T18:30:46Z
format	Article
id	doaj.art-3d68f063c5674e3290dc73d72d14d6df
institution	Directory Open Access Journal
issn	2352-3409
language	English
last_indexed	2024-03-11T18:30:46Z
publishDate	2023-10-01
publisher	Elsevier
record_format	Article
series	Data in Brief
spelling	doaj.art-3d68f063c5674e3290dc73d72d14d6df2023-10-13T11:04:41ZengElsevierData in Brief2352-34092023-10-0150109445BTSD: A curated transformation of sentence dataset for text classification in Bangla languageRajesh Kumar Das0Mirajul Islam1Sharun Akter Khushbu2Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshCorresponding author.; Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshDepartment of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshThe Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains.http://www.sciencedirect.com/science/article/pii/S2352340923005450Natural language processingMachine learningText classificationTransformation of sentenceBangla language
spellingShingle	Rajesh Kumar Das Mirajul Islam Sharun Akter Khushbu BTSD: A curated transformation of sentence dataset for text classification in Bangla language Data in Brief Natural language processing Machine learning Text classification Transformation of sentence Bangla language
title	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_full	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_fullStr	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_full_unstemmed	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_short	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_sort	btsd a curated transformation of sentence dataset for text classification in bangla language
topic	Natural language processing Machine learning Text classification Transformation of sentence Bangla language
url	http://www.sciencedirect.com/science/article/pii/S2352340923005450
work_keys_str_mv	AT rajeshkumardas btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage AT mirajulislam btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage AT sharunakterkhushbu btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage

BTSD: A curated transformation of sentence dataset for text classification in Bangla language

Similar Items