Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences

Aldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East...

Full description

Bibliographic Details
Main Authors: Monika Khandelwal, Sabha Sheikh, Ranjeet Kumar Rout, Saiyed Umer, Saurav Mallik, Zhongming Zhao
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/13/2228
_version_ 1797442783577899008
author Monika Khandelwal
Sabha Sheikh
Ranjeet Kumar Rout
Saiyed Umer
Saurav Mallik
Zhongming Zhao
author_facet Monika Khandelwal
Sabha Sheikh
Ranjeet Kumar Rout
Saiyed Umer
Saurav Mallik
Zhongming Zhao
author_sort Monika Khandelwal
collection DOAJ
description Aldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East Asians have ALDH2 deficiency (inactive mitochondrial isozyme), with lysine (K) for glutamate (E) substitution at position 487 (E487K). ALDH2 deficiency is also known as Alcohol Flushing Syndrome or Asian Glow. For people with an ALDH2 deficiency, their face turns red after drinking alcohol, and they are more susceptible to various diseases than ALDH2-normal people. This study performed a machine learning analysis of ALDH2 sequences of thirteen other species by comparing them with the human ALDH2 sequence. Based on the various quantitative metrics (physicochemical properties, secondary structure, Hurst exponent, Shannon entropy, and fractal dimension), these fourteen species were clustered into four clusters using the unsupervised machine learning (K-means clustering) algorithm. We also analyze these species using hierarchical clustering (agglomerative clustering) and draw the phylogenetic trees. The results show that <i>Homo sapiens</i> is more closely related to the <i>Bos taurus</i> and <i>Sus scrofa</i> species. Our experimental results suggest that the testing for discovering medicines may be done on these species before being tested in humans to alleviate the impacts of ALDH2 deficiency.
first_indexed 2024-03-09T12:47:06Z
format Article
id doaj.art-9b136f0294af4527b3b98b6cf88342ff
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T12:47:06Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-9b136f0294af4527b3b98b6cf88342ff2023-11-30T22:11:33ZengMDPI AGMathematics2227-73902022-06-011013222810.3390/math10132228Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein SequencesMonika Khandelwal0Sabha Sheikh1Ranjeet Kumar Rout2Saiyed Umer3Saurav Mallik4Zhongming Zhao5Department of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, IndiaDepartment of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, IndiaDepartment of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, IndiaDepartment of Computer Science and Engineering, Aliah University, Newtown, Kolkata 700160, IndiaCenter for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USACenter for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USAAldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East Asians have ALDH2 deficiency (inactive mitochondrial isozyme), with lysine (K) for glutamate (E) substitution at position 487 (E487K). ALDH2 deficiency is also known as Alcohol Flushing Syndrome or Asian Glow. For people with an ALDH2 deficiency, their face turns red after drinking alcohol, and they are more susceptible to various diseases than ALDH2-normal people. This study performed a machine learning analysis of ALDH2 sequences of thirteen other species by comparing them with the human ALDH2 sequence. Based on the various quantitative metrics (physicochemical properties, secondary structure, Hurst exponent, Shannon entropy, and fractal dimension), these fourteen species were clustered into four clusters using the unsupervised machine learning (K-means clustering) algorithm. We also analyze these species using hierarchical clustering (agglomerative clustering) and draw the phylogenetic trees. The results show that <i>Homo sapiens</i> is more closely related to the <i>Bos taurus</i> and <i>Sus scrofa</i> species. Our experimental results suggest that the testing for discovering medicines may be done on these species before being tested in humans to alleviate the impacts of ALDH2 deficiency.https://www.mdpi.com/2227-7390/10/13/2228aldehyde dehydrogenase 2ethanol metabolismmachine learningphysicochemical propertiessecondary structure
spellingShingle Monika Khandelwal
Sabha Sheikh
Ranjeet Kumar Rout
Saiyed Umer
Saurav Mallik
Zhongming Zhao
Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
Mathematics
aldehyde dehydrogenase 2
ethanol metabolism
machine learning
physicochemical properties
secondary structure
title Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
title_full Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
title_fullStr Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
title_full_unstemmed Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
title_short Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
title_sort unsupervised learning for feature representation using spatial distribution of amino acids in aldehyde dehydrogenase aldh2 protein sequences
topic aldehyde dehydrogenase 2
ethanol metabolism
machine learning
physicochemical properties
secondary structure
url https://www.mdpi.com/2227-7390/10/13/2228
work_keys_str_mv AT monikakhandelwal unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences
AT sabhasheikh unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences
AT ranjeetkumarrout unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences
AT saiyedumer unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences
AT sauravmallik unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences
AT zhongmingzhao unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences