BTSD: A curated transformation of sentence dataset for text classification in Bangla language
The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentence...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-10-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340923005450 |
_version_ | 1797660434228051968 |
---|---|
author | Rajesh Kumar Das Mirajul Islam Sharun Akter Khushbu |
author_facet | Rajesh Kumar Das Mirajul Islam Sharun Akter Khushbu |
author_sort | Rajesh Kumar Das |
collection | DOAJ |
description | The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains. |
first_indexed | 2024-03-11T18:30:46Z |
format | Article |
id | doaj.art-3d68f063c5674e3290dc73d72d14d6df |
institution | Directory Open Access Journal |
issn | 2352-3409 |
language | English |
last_indexed | 2024-03-11T18:30:46Z |
publishDate | 2023-10-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj.art-3d68f063c5674e3290dc73d72d14d6df2023-10-13T11:04:41ZengElsevierData in Brief2352-34092023-10-0150109445BTSD: A curated transformation of sentence dataset for text classification in Bangla languageRajesh Kumar Das0Mirajul Islam1Sharun Akter Khushbu2Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshCorresponding author.; Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshDepartment of Computer Science and Engineering, Daffodil International University, Dhaka 1341, BangladeshThe Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains.http://www.sciencedirect.com/science/article/pii/S2352340923005450Natural language processingMachine learningText classificationTransformation of sentenceBangla language |
spellingShingle | Rajesh Kumar Das Mirajul Islam Sharun Akter Khushbu BTSD: A curated transformation of sentence dataset for text classification in Bangla language Data in Brief Natural language processing Machine learning Text classification Transformation of sentence Bangla language |
title | BTSD: A curated transformation of sentence dataset for text classification in Bangla language |
title_full | BTSD: A curated transformation of sentence dataset for text classification in Bangla language |
title_fullStr | BTSD: A curated transformation of sentence dataset for text classification in Bangla language |
title_full_unstemmed | BTSD: A curated transformation of sentence dataset for text classification in Bangla language |
title_short | BTSD: A curated transformation of sentence dataset for text classification in Bangla language |
title_sort | btsd a curated transformation of sentence dataset for text classification in bangla language |
topic | Natural language processing Machine learning Text classification Transformation of sentence Bangla language |
url | http://www.sciencedirect.com/science/article/pii/S2352340923005450 |
work_keys_str_mv | AT rajeshkumardas btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage AT mirajulislam btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage AT sharunakterkhushbu btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage |