Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis

The preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically o...

Full description

Bibliographic Details
Main Authors:	Naiwrita Borah, Udayan Baruah, Mahesh Thylore Ramakrishna, V. Vinoth Kumar, D. Ramya Dorai, Jonnakuti Rajkumar Annad
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Feature-based approaches comparative analysis societal empowerment Assamese literary works automaticword recognition machine learning
Online Access:	https://ieeexplore.ieee.org/document/10206436/

_version_	1797742954866016256
author	Naiwrita Borah Udayan Baruah Mahesh Thylore Ramakrishna V. Vinoth Kumar D. Ramya Dorai Jonnakuti Rajkumar Annad
author_facet	Naiwrita Borah Udayan Baruah Mahesh Thylore Ramakrishna V. Vinoth Kumar D. Ramya Dorai Jonnakuti Rajkumar Annad
author_sort	Naiwrita Borah
collection	DOAJ
description	The preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically on their distinctive visual characteristics. The main aim of this project is to improve word recognition technologies in Indic languages, specifically focusing on Assamese, in order to preserve and provide access to Assamese literature for future generations. The classification procedure entails the examination of 19 shape-related attributes through a range of machine learning algorithms, such as Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM) with various kernels, K Nearest Neighbors, and Gradient Boosting. The assessment of the model involves the utilization of various metrics such as Accuracy, Precision, Kappa, F1-score, Model Build Time, and Model Run Time to evaluate the computational efficiency. Additionally, the metrics of Area under the Curve (AUC) and Receiver Operating Characteristic (ROC) are also considered in the evaluation process. Out of the four datasets analyzed, Dataset 3 exhibits the highest level of performance. It is worth noting that Gradient Boosting demonstrates the highest level of accuracy, reaching 96.03% for conventional machine learning appraoches. Logistic Regression and SVM with RBF kernel closely trail behind, achieving accuracies of 95.64% and 95.60% respectively. Furthermore, the research conducted in this study also employs multiple layers of Convolutional Neural Networks (CNN), resulting in a remarkable recognition accuracy of 97.3%. This finding demonstrates that the CNN model and the proposed feature-set are in close proximity to one another in terms of the evaluation metrics.
first_indexed	2024-03-12T14:47:38Z
format	Article
id	doaj.art-35496570b16b449a8bffa7daddd52739
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-12T14:47:38Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-35496570b16b449a8bffa7daddd527392023-08-15T23:00:45ZengIEEEIEEE Access2169-35362023-01-0111823028232610.1109/ACCESS.2023.330156410206436Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based AnalysisNaiwrita Borah0https://orcid.org/0000-0002-5033-4258Udayan Baruah1https://orcid.org/0000-0003-4026-5962Mahesh Thylore Ramakrishna2V. Vinoth Kumar3https://orcid.org/0000-0003-1070-3212D. Ramya Dorai4Jonnakuti Rajkumar Annad5https://orcid.org/0009-0001-2696-6087Department of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaDepartment of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, Gangtok, IndiaDepartment of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaSchool of Information Technology and Engineering, Vellore Institute of Technology University, Vellore, IndiaDepartment of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaDepartment of Electromechanical Engineering, Sawla Campus, Arba Minch University, Arba Minch, EthiopiaThe preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically on their distinctive visual characteristics. The main aim of this project is to improve word recognition technologies in Indic languages, specifically focusing on Assamese, in order to preserve and provide access to Assamese literature for future generations. The classification procedure entails the examination of 19 shape-related attributes through a range of machine learning algorithms, such as Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM) with various kernels, K Nearest Neighbors, and Gradient Boosting. The assessment of the model involves the utilization of various metrics such as Accuracy, Precision, Kappa, F1-score, Model Build Time, and Model Run Time to evaluate the computational efficiency. Additionally, the metrics of Area under the Curve (AUC) and Receiver Operating Characteristic (ROC) are also considered in the evaluation process. Out of the four datasets analyzed, Dataset 3 exhibits the highest level of performance. It is worth noting that Gradient Boosting demonstrates the highest level of accuracy, reaching 96.03% for conventional machine learning appraoches. Logistic Regression and SVM with RBF kernel closely trail behind, achieving accuracies of 95.64% and 95.60% respectively. Furthermore, the research conducted in this study also employs multiple layers of Convolutional Neural Networks (CNN), resulting in a remarkable recognition accuracy of 97.3%. This finding demonstrates that the CNN model and the proposed feature-set are in close proximity to one another in terms of the evaluation metrics.https://ieeexplore.ieee.org/document/10206436/Feature-based approachescomparative analysissocietal empowermentAssamese literary worksautomaticword recognitionmachine learning
spellingShingle	Naiwrita Borah Udayan Baruah Mahesh Thylore Ramakrishna V. Vinoth Kumar D. Ramya Dorai Jonnakuti Rajkumar Annad Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis IEEE Access Feature-based approaches comparative analysis societal empowerment Assamese literary works automaticword recognition machine learning
title	Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_full	Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_fullStr	Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_full_unstemmed	Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_short	Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_sort	efficient assamese word recognition for societal empowerment a comparative feature based analysis
topic	Feature-based approaches comparative analysis societal empowerment Assamese literary works automaticword recognition machine learning
url	https://ieeexplore.ieee.org/document/10206436/
work_keys_str_mv	AT naiwritaborah efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT udayanbaruah efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT maheshthyloreramakrishna efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT vvinothkumar efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT dramyadorai efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis AT jonnakutirajkumarannad efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis

Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis

Similar Items