Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences

Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant de...

Full description

Bibliographic Details
Main Authors: Luu Ho Thanh Lam, Ngoc Hoang Le, Le Van Tuan, Ho Tran Ban, Truong Nguyen Khanh Hung, Ngan Thi Kim Nguyen, Luong Huu Dang, Nguyen Quoc Khanh Le
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Biology
Subjects:
Online Access:https://www.mdpi.com/2079-7737/9/10/325
_version_ 1827704828289089536
author Luu Ho Thanh Lam
Ngoc Hoang Le
Le Van Tuan
Ho Tran Ban
Truong Nguyen Khanh Hung
Ngan Thi Kim Nguyen
Luong Huu Dang
Nguyen Quoc Khanh Le
author_facet Luu Ho Thanh Lam
Ngoc Hoang Le
Le Van Tuan
Ho Tran Ban
Truong Nguyen Khanh Hung
Ngan Thi Kim Nguyen
Luong Huu Dang
Nguyen Quoc Khanh Le
author_sort Luu Ho Thanh Lam
collection DOAJ
description Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
first_indexed 2024-03-10T15:49:42Z
format Article
id doaj.art-57fd0398fa874604a6adb6acc2db2061
institution Directory Open Access Journal
issn 2079-7737
language English
last_indexed 2024-03-10T15:49:42Z
publishDate 2020-10-01
publisher MDPI AG
record_format Article
series Biology
spelling doaj.art-57fd0398fa874604a6adb6acc2db20612023-11-20T16:10:06ZengMDPI AGBiology2079-77372020-10-0191032510.3390/biology9100325Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary SequencesLuu Ho Thanh Lam0Ngoc Hoang Le1Le Van Tuan2Ho Tran Ban3Truong Nguyen Khanh Hung4Ngan Thi Kim Nguyen5Luong Huu Dang6Nguyen Quoc Khanh Le7International Master/PhD Program in Medicine, College of Medicine, Taipei Medical University, Taipei City 110, TaiwanGraduate Institute of Biomedical Materials and Tissue Engineering, College of Biomedical Engineering, Taipei Medical University, Taipei City 110, TaiwanOrthopedic and Trauma Department, Cho Ray Hospital, Ho Chi Minh City 70000, VietnamDepartment of Pediatric Surgery, University of Medicine and Pharmacy, Ho Chi Minh City 70000, VietnamInternational Master/PhD Program in Medicine, College of Medicine, Taipei Medical University, Taipei City 110, TaiwanSchool of Nutrition and Health Sciences, Taipei Medical University, Taipei City 110, TaiwanDepartment of Otolaryngology, University of Medicine and Pharmacy, Ho Chi Minh City 70000, VietnamInternational Master/PhD Program in Medicine, College of Medicine, Taipei Medical University, Taipei City 110, TaiwanAntioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.https://www.mdpi.com/2079-7737/9/10/325antioxidant proteinsmachine learningRandom Forestprotein sequencingfeature selectioncomputational modeling
spellingShingle Luu Ho Thanh Lam
Ngoc Hoang Le
Le Van Tuan
Ho Tran Ban
Truong Nguyen Khanh Hung
Ngan Thi Kim Nguyen
Luong Huu Dang
Nguyen Quoc Khanh Le
Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
Biology
antioxidant proteins
machine learning
Random Forest
protein sequencing
feature selection
computational modeling
title Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_full Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_fullStr Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_full_unstemmed Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_short Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_sort machine learning model for identifying antioxidant proteins using features calculated from primary sequences
topic antioxidant proteins
machine learning
Random Forest
protein sequencing
feature selection
computational modeling
url https://www.mdpi.com/2079-7737/9/10/325
work_keys_str_mv AT luuhothanhlam machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT ngochoangle machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT levantuan machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT hotranban machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT truongnguyenkhanhhung machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT nganthikimnguyen machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT luonghuudang machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT nguyenquockhanhle machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences