Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
The preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically o...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10206436/ |
_version_ | 1797742954866016256 |
---|---|
author | Naiwrita Borah Udayan Baruah Mahesh Thylore Ramakrishna V. Vinoth Kumar D. Ramya Dorai Jonnakuti Rajkumar Annad |
author_facet | Naiwrita Borah Udayan Baruah Mahesh Thylore Ramakrishna V. Vinoth Kumar D. Ramya Dorai Jonnakuti Rajkumar Annad |
author_sort | Naiwrita Borah |
collection | DOAJ |
description | The preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically on their distinctive visual characteristics. The main aim of this project is to improve word recognition technologies in Indic languages, specifically focusing on Assamese, in order to preserve and provide access to Assamese literature for future generations. The classification procedure entails the examination of 19 shape-related attributes through a range of machine learning algorithms, such as Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM) with various kernels, K Nearest Neighbors, and Gradient Boosting. The assessment of the model involves the utilization of various metrics such as Accuracy, Precision, Kappa, F1-score, Model Build Time, and Model Run Time to evaluate the computational efficiency. Additionally, the metrics of Area under the Curve (AUC) and Receiver Operating Characteristic (ROC) are also considered in the evaluation process. Out of the four datasets analyzed, Dataset 3 exhibits the highest level of performance. It is worth noting that Gradient Boosting demonstrates the highest level of accuracy, reaching 96.03% for conventional machine learning appraoches. Logistic Regression and SVM with RBF kernel closely trail behind, achieving accuracies of 95.64% and 95.60% respectively. Furthermore, the research conducted in this study also employs multiple layers of Convolutional Neural Networks (CNN), resulting in a remarkable recognition accuracy of 97.3%. This finding demonstrates that the CNN model and the proposed feature-set are in close proximity to one another in terms of the evaluation metrics. |
first_indexed | 2024-03-12T14:47:38Z |
format | Article |
id | doaj.art-35496570b16b449a8bffa7daddd52739 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-12T14:47:38Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-35496570b16b449a8bffa7daddd527392023-08-15T23:00:45ZengIEEEIEEE Access2169-35362023-01-0111823028232610.1109/ACCESS.2023.330156410206436Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based AnalysisNaiwrita Borah0https://orcid.org/0000-0002-5033-4258Udayan Baruah1https://orcid.org/0000-0003-4026-5962Mahesh Thylore Ramakrishna2V. Vinoth Kumar3https://orcid.org/0000-0003-1070-3212D. Ramya Dorai4Jonnakuti Rajkumar Annad5https://orcid.org/0009-0001-2696-6087Department of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaDepartment of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, Gangtok, IndiaDepartment of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaSchool of Information Technology and Engineering, Vellore Institute of Technology University, Vellore, IndiaDepartment of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaDepartment of Electromechanical Engineering, Sawla Campus, Arba Minch University, Arba Minch, EthiopiaThe preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically on their distinctive visual characteristics. The main aim of this project is to improve word recognition technologies in Indic languages, specifically focusing on Assamese, in order to preserve and provide access to Assamese literature for future generations. The classification procedure entails the examination of 19 shape-related attributes through a range of machine learning algorithms, such as Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM) with various kernels, K Nearest Neighbors, and Gradient Boosting. The assessment of the model involves the utilization of various metrics such as Accuracy, Precision, Kappa, F1-score, Model Build Time, and Model Run Time to evaluate the computational efficiency. Additionally, the metrics of Area under the Curve (AUC) and Receiver Operating Characteristic (ROC) are also considered in the evaluation process. Out of the four datasets analyzed, Dataset 3 exhibits the highest level of performance. It is worth noting that Gradient Boosting demonstrates the highest level of accuracy, reaching 96.03% for conventional machine learning appraoches. Logistic Regression and SVM with RBF kernel closely trail behind, achieving accuracies of 95.64% and 95.60% respectively. Furthermore, the research conducted in this study also employs multiple layers of Convolutional Neural Networks (CNN), resulting in a remarkable recognition accuracy of 97.3%. This finding demonstrates that the CNN model and the proposed feature-set are in close proximity to one another in terms of the evaluation metrics.https://ieeexplore.ieee.org/document/10206436/Feature-based approachescomparative analysissocietal empowermentAssamese literary worksautomaticword recognitionmachine learning |
spellingShingle | Naiwrita Borah Udayan Baruah Mahesh Thylore Ramakrishna V. Vinoth Kumar D. Ramya Dorai Jonnakuti Rajkumar Annad Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis IEEE Access Feature-based approaches comparative analysis societal empowerment Assamese literary works automaticword recognition machine learning |
title | Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis |
title_full | Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis |
title_fullStr | Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis |
title_full_unstemmed | Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis |
title_short | Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis |
title_sort | efficient assamese word recognition for societal empowerment a comparative feature based analysis |
topic | Feature-based approaches comparative analysis societal empowerment Assamese literary works automaticword recognition machine learning |
url | https://ieeexplore.ieee.org/document/10206436/ |
work_keys_str_mv | AT naiwritaborah efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT udayanbaruah efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT maheshthyloreramakrishna efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT vvinothkumar efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT dramyadorai efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT jonnakutirajkumarannad efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis |