Prediction and analysis of antifreeze proteins
Antifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorga...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-09-01
|
Series: | Heliyon |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2405844021020569 |
_version_ | 1818716287969263616 |
---|---|
author | Ryosuke Miyata Yoshitaka Moriwaki Tohru Terada Kentaro Shimizu |
author_facet | Ryosuke Miyata Yoshitaka Moriwaki Tohru Terada Kentaro Shimizu |
author_sort | Ryosuke Miyata |
collection | DOAJ |
description | Antifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorganisms, and fungi. However, the number of known AFPs is currently limited, and it is essential to construct a reliable dataset of AFPs and develop a bioinformatics tool to predict AFPs. In this work, we first collected AFPs sequences from UniProtKB considering the reliability of annotations and, based on these datasets, developed a prediction system using random forest. We achieved accuracies of 0.961 and 0.947 for non-redundant sequences with less than 90% and 30% identities and achieved the accuracy of 0.953 for representative sequences for each species. Using the ability of random forest, we identified the sequence features that contributed to the prediction. Some sequence features were common to AFPs from different species. These features include the Cys content, Ala-Ala content, Trp-Gly content, and the amino acids’ distribution related to the disorder propensity. The computer program and the dataset developed in this work are available from the GitHub site: https://github.com/ryomiya/Prediction-and-analysis-of-antifreeze-proteins. |
first_indexed | 2024-12-17T19:16:52Z |
format | Article |
id | doaj.art-8cba00534b3c4b49aaa481f440db0f43 |
institution | Directory Open Access Journal |
issn | 2405-8440 |
language | English |
last_indexed | 2024-12-17T19:16:52Z |
publishDate | 2021-09-01 |
publisher | Elsevier |
record_format | Article |
series | Heliyon |
spelling | doaj.art-8cba00534b3c4b49aaa481f440db0f432022-12-21T21:35:43ZengElsevierHeliyon2405-84402021-09-0179e07953Prediction and analysis of antifreeze proteinsRyosuke Miyata0Yoshitaka Moriwaki1Tohru Terada2Kentaro Shimizu3Department of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanDepartment of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanDepartment of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanCorresponding author.; Department of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanAntifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorganisms, and fungi. However, the number of known AFPs is currently limited, and it is essential to construct a reliable dataset of AFPs and develop a bioinformatics tool to predict AFPs. In this work, we first collected AFPs sequences from UniProtKB considering the reliability of annotations and, based on these datasets, developed a prediction system using random forest. We achieved accuracies of 0.961 and 0.947 for non-redundant sequences with less than 90% and 30% identities and achieved the accuracy of 0.953 for representative sequences for each species. Using the ability of random forest, we identified the sequence features that contributed to the prediction. Some sequence features were common to AFPs from different species. These features include the Cys content, Ala-Ala content, Trp-Gly content, and the amino acids’ distribution related to the disorder propensity. The computer program and the dataset developed in this work are available from the GitHub site: https://github.com/ryomiya/Prediction-and-analysis-of-antifreeze-proteins.http://www.sciencedirect.com/science/article/pii/S2405844021020569Antifreeze proteinsPredictionProtein sequencesAmino acidsRandom forest |
spellingShingle | Ryosuke Miyata Yoshitaka Moriwaki Tohru Terada Kentaro Shimizu Prediction and analysis of antifreeze proteins Heliyon Antifreeze proteins Prediction Protein sequences Amino acids Random forest |
title | Prediction and analysis of antifreeze proteins |
title_full | Prediction and analysis of antifreeze proteins |
title_fullStr | Prediction and analysis of antifreeze proteins |
title_full_unstemmed | Prediction and analysis of antifreeze proteins |
title_short | Prediction and analysis of antifreeze proteins |
title_sort | prediction and analysis of antifreeze proteins |
topic | Antifreeze proteins Prediction Protein sequences Amino acids Random forest |
url | http://www.sciencedirect.com/science/article/pii/S2405844021020569 |
work_keys_str_mv | AT ryosukemiyata predictionandanalysisofantifreezeproteins AT yoshitakamoriwaki predictionandanalysisofantifreezeproteins AT tohruterada predictionandanalysisofantifreezeproteins AT kentaroshimizu predictionandanalysisofantifreezeproteins |