Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis

The preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically o...

Full description

Bibliographic Details
Main Authors: Naiwrita Borah, Udayan Baruah, Mahesh Thylore Ramakrishna, V. Vinoth Kumar, D. Ramya Dorai, Jonnakuti Rajkumar Annad
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10206436/
_version_ 1797742954866016256
author Naiwrita Borah
Udayan Baruah
Mahesh Thylore Ramakrishna
V. Vinoth Kumar
D. Ramya Dorai
Jonnakuti Rajkumar Annad
author_facet Naiwrita Borah
Udayan Baruah
Mahesh Thylore Ramakrishna
V. Vinoth Kumar
D. Ramya Dorai
Jonnakuti Rajkumar Annad
author_sort Naiwrita Borah
collection DOAJ
description The preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically on their distinctive visual characteristics. The main aim of this project is to improve word recognition technologies in Indic languages, specifically focusing on Assamese, in order to preserve and provide access to Assamese literature for future generations. The classification procedure entails the examination of 19 shape-related attributes through a range of machine learning algorithms, such as Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM) with various kernels, K Nearest Neighbors, and Gradient Boosting. The assessment of the model involves the utilization of various metrics such as Accuracy, Precision, Kappa, F1-score, Model Build Time, and Model Run Time to evaluate the computational efficiency. Additionally, the metrics of Area under the Curve (AUC) and Receiver Operating Characteristic (ROC) are also considered in the evaluation process. Out of the four datasets analyzed, Dataset 3 exhibits the highest level of performance. It is worth noting that Gradient Boosting demonstrates the highest level of accuracy, reaching 96.03% for conventional machine learning appraoches. Logistic Regression and SVM with RBF kernel closely trail behind, achieving accuracies of 95.64% and 95.60% respectively. Furthermore, the research conducted in this study also employs multiple layers of Convolutional Neural Networks (CNN), resulting in a remarkable recognition accuracy of 97.3%. This finding demonstrates that the CNN model and the proposed feature-set are in close proximity to one another in terms of the evaluation metrics.
first_indexed 2024-03-12T14:47:38Z
format Article
id doaj.art-35496570b16b449a8bffa7daddd52739
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-12T14:47:38Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-35496570b16b449a8bffa7daddd527392023-08-15T23:00:45ZengIEEEIEEE Access2169-35362023-01-0111823028232610.1109/ACCESS.2023.330156410206436Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based AnalysisNaiwrita Borah0https://orcid.org/0000-0002-5033-4258Udayan Baruah1https://orcid.org/0000-0003-4026-5962Mahesh Thylore Ramakrishna2V. Vinoth Kumar3https://orcid.org/0000-0003-1070-3212D. Ramya Dorai4Jonnakuti Rajkumar Annad5https://orcid.org/0009-0001-2696-6087Department of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaDepartment of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, Gangtok, IndiaDepartment of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaSchool of Information Technology and Engineering, Vellore Institute of Technology University, Vellore, IndiaDepartment of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-be University), Bengaluru, IndiaDepartment of Electromechanical Engineering, Sawla Campus, Arba Minch University, Arba Minch, EthiopiaThe preservation and digitization of historical data are crucial for ensuring the continuity and accessibility of information over successive generations. The present study investigates the utilization of machine learning methodologies in the identification of Assamese words, focusing specifically on their distinctive visual characteristics. The main aim of this project is to improve word recognition technologies in Indic languages, specifically focusing on Assamese, in order to preserve and provide access to Assamese literature for future generations. The classification procedure entails the examination of 19 shape-related attributes through a range of machine learning algorithms, such as Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM) with various kernels, K Nearest Neighbors, and Gradient Boosting. The assessment of the model involves the utilization of various metrics such as Accuracy, Precision, Kappa, F1-score, Model Build Time, and Model Run Time to evaluate the computational efficiency. Additionally, the metrics of Area under the Curve (AUC) and Receiver Operating Characteristic (ROC) are also considered in the evaluation process. Out of the four datasets analyzed, Dataset 3 exhibits the highest level of performance. It is worth noting that Gradient Boosting demonstrates the highest level of accuracy, reaching 96.03% for conventional machine learning appraoches. Logistic Regression and SVM with RBF kernel closely trail behind, achieving accuracies of 95.64% and 95.60% respectively. Furthermore, the research conducted in this study also employs multiple layers of Convolutional Neural Networks (CNN), resulting in a remarkable recognition accuracy of 97.3%. This finding demonstrates that the CNN model and the proposed feature-set are in close proximity to one another in terms of the evaluation metrics.https://ieeexplore.ieee.org/document/10206436/Feature-based approachescomparative analysissocietal empowermentAssamese literary worksautomaticword recognitionmachine learning
spellingShingle Naiwrita Borah
Udayan Baruah
Mahesh Thylore Ramakrishna
V. Vinoth Kumar
D. Ramya Dorai
Jonnakuti Rajkumar Annad
Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
IEEE Access
Feature-based approaches
comparative analysis
societal empowerment
Assamese literary works
automaticword recognition
machine learning
title Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_full Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_fullStr Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_full_unstemmed Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_short Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis
title_sort efficient assamese word recognition for societal empowerment a comparative feature based analysis
topic Feature-based approaches
comparative analysis
societal empowerment
Assamese literary works
automaticword recognition
machine learning
url https://ieeexplore.ieee.org/document/10206436/
work_keys_str_mv AT naiwritaborah efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis
AT udayanbaruah efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis
AT maheshthyloreramakrishna efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis
AT vvinothkumar efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis
AT dramyadorai efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis
AT jonnakutirajkumarannad efficientassamesewordrecognitionforsocietalempowermentacomparativefeaturebasedanalysis