Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
Aldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-06-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/10/13/2228 |
_version_ | 1797442783577899008 |
---|---|
author | Monika Khandelwal Sabha Sheikh Ranjeet Kumar Rout Saiyed Umer Saurav Mallik Zhongming Zhao |
author_facet | Monika Khandelwal Sabha Sheikh Ranjeet Kumar Rout Saiyed Umer Saurav Mallik Zhongming Zhao |
author_sort | Monika Khandelwal |
collection | DOAJ |
description | Aldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East Asians have ALDH2 deficiency (inactive mitochondrial isozyme), with lysine (K) for glutamate (E) substitution at position 487 (E487K). ALDH2 deficiency is also known as Alcohol Flushing Syndrome or Asian Glow. For people with an ALDH2 deficiency, their face turns red after drinking alcohol, and they are more susceptible to various diseases than ALDH2-normal people. This study performed a machine learning analysis of ALDH2 sequences of thirteen other species by comparing them with the human ALDH2 sequence. Based on the various quantitative metrics (physicochemical properties, secondary structure, Hurst exponent, Shannon entropy, and fractal dimension), these fourteen species were clustered into four clusters using the unsupervised machine learning (K-means clustering) algorithm. We also analyze these species using hierarchical clustering (agglomerative clustering) and draw the phylogenetic trees. The results show that <i>Homo sapiens</i> is more closely related to the <i>Bos taurus</i> and <i>Sus scrofa</i> species. Our experimental results suggest that the testing for discovering medicines may be done on these species before being tested in humans to alleviate the impacts of ALDH2 deficiency. |
first_indexed | 2024-03-09T12:47:06Z |
format | Article |
id | doaj.art-9b136f0294af4527b3b98b6cf88342ff |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-09T12:47:06Z |
publishDate | 2022-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-9b136f0294af4527b3b98b6cf88342ff2023-11-30T22:11:33ZengMDPI AGMathematics2227-73902022-06-011013222810.3390/math10132228Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein SequencesMonika Khandelwal0Sabha Sheikh1Ranjeet Kumar Rout2Saiyed Umer3Saurav Mallik4Zhongming Zhao5Department of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, IndiaDepartment of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, IndiaDepartment of Computer Science & Engineering, National Institute of Technology Srinagar, Hazratbal 190006, IndiaDepartment of Computer Science and Engineering, Aliah University, Newtown, Kolkata 700160, IndiaCenter for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USACenter for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USAAldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East Asians have ALDH2 deficiency (inactive mitochondrial isozyme), with lysine (K) for glutamate (E) substitution at position 487 (E487K). ALDH2 deficiency is also known as Alcohol Flushing Syndrome or Asian Glow. For people with an ALDH2 deficiency, their face turns red after drinking alcohol, and they are more susceptible to various diseases than ALDH2-normal people. This study performed a machine learning analysis of ALDH2 sequences of thirteen other species by comparing them with the human ALDH2 sequence. Based on the various quantitative metrics (physicochemical properties, secondary structure, Hurst exponent, Shannon entropy, and fractal dimension), these fourteen species were clustered into four clusters using the unsupervised machine learning (K-means clustering) algorithm. We also analyze these species using hierarchical clustering (agglomerative clustering) and draw the phylogenetic trees. The results show that <i>Homo sapiens</i> is more closely related to the <i>Bos taurus</i> and <i>Sus scrofa</i> species. Our experimental results suggest that the testing for discovering medicines may be done on these species before being tested in humans to alleviate the impacts of ALDH2 deficiency.https://www.mdpi.com/2227-7390/10/13/2228aldehyde dehydrogenase 2ethanol metabolismmachine learningphysicochemical propertiessecondary structure |
spellingShingle | Monika Khandelwal Sabha Sheikh Ranjeet Kumar Rout Saiyed Umer Saurav Mallik Zhongming Zhao Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences Mathematics aldehyde dehydrogenase 2 ethanol metabolism machine learning physicochemical properties secondary structure |
title | Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences |
title_full | Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences |
title_fullStr | Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences |
title_full_unstemmed | Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences |
title_short | Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences |
title_sort | unsupervised learning for feature representation using spatial distribution of amino acids in aldehyde dehydrogenase aldh2 protein sequences |
topic | aldehyde dehydrogenase 2 ethanol metabolism machine learning physicochemical properties secondary structure |
url | https://www.mdpi.com/2227-7390/10/13/2228 |
work_keys_str_mv | AT monikakhandelwal unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences AT sabhasheikh unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences AT ranjeetkumarrout unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences AT saiyedumer unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences AT sauravmallik unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences AT zhongmingzhao unsupervisedlearningforfeaturerepresentationusingspatialdistributionofaminoacidsinaldehydedehydrogenasealdh2proteinsequences |