Prediction and analysis of antifreeze proteins

Antifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorga...

Full description

Bibliographic Details
Main Authors: Ryosuke Miyata, Yoshitaka Moriwaki, Tohru Terada, Kentaro Shimizu
Format: Article
Language:English
Published: Elsevier 2021-09-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844021020569
_version_ 1818716287969263616
author Ryosuke Miyata
Yoshitaka Moriwaki
Tohru Terada
Kentaro Shimizu
author_facet Ryosuke Miyata
Yoshitaka Moriwaki
Tohru Terada
Kentaro Shimizu
author_sort Ryosuke Miyata
collection DOAJ
description Antifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorganisms, and fungi. However, the number of known AFPs is currently limited, and it is essential to construct a reliable dataset of AFPs and develop a bioinformatics tool to predict AFPs. In this work, we first collected AFPs sequences from UniProtKB considering the reliability of annotations and, based on these datasets, developed a prediction system using random forest. We achieved accuracies of 0.961 and 0.947 for non-redundant sequences with less than 90% and 30% identities and achieved the accuracy of 0.953 for representative sequences for each species. Using the ability of random forest, we identified the sequence features that contributed to the prediction. Some sequence features were common to AFPs from different species. These features include the Cys content, Ala-Ala content, Trp-Gly content, and the amino acids’ distribution related to the disorder propensity. The computer program and the dataset developed in this work are available from the GitHub site: https://github.com/ryomiya/Prediction-and-analysis-of-antifreeze-proteins.
first_indexed 2024-12-17T19:16:52Z
format Article
id doaj.art-8cba00534b3c4b49aaa481f440db0f43
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-12-17T19:16:52Z
publishDate 2021-09-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-8cba00534b3c4b49aaa481f440db0f432022-12-21T21:35:43ZengElsevierHeliyon2405-84402021-09-0179e07953Prediction and analysis of antifreeze proteinsRyosuke Miyata0Yoshitaka Moriwaki1Tohru Terada2Kentaro Shimizu3Department of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanDepartment of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanDepartment of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanCorresponding author.; Department of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, JapanAntifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorganisms, and fungi. However, the number of known AFPs is currently limited, and it is essential to construct a reliable dataset of AFPs and develop a bioinformatics tool to predict AFPs. In this work, we first collected AFPs sequences from UniProtKB considering the reliability of annotations and, based on these datasets, developed a prediction system using random forest. We achieved accuracies of 0.961 and 0.947 for non-redundant sequences with less than 90% and 30% identities and achieved the accuracy of 0.953 for representative sequences for each species. Using the ability of random forest, we identified the sequence features that contributed to the prediction. Some sequence features were common to AFPs from different species. These features include the Cys content, Ala-Ala content, Trp-Gly content, and the amino acids’ distribution related to the disorder propensity. The computer program and the dataset developed in this work are available from the GitHub site: https://github.com/ryomiya/Prediction-and-analysis-of-antifreeze-proteins.http://www.sciencedirect.com/science/article/pii/S2405844021020569Antifreeze proteinsPredictionProtein sequencesAmino acidsRandom forest
spellingShingle Ryosuke Miyata
Yoshitaka Moriwaki
Tohru Terada
Kentaro Shimizu
Prediction and analysis of antifreeze proteins
Heliyon
Antifreeze proteins
Prediction
Protein sequences
Amino acids
Random forest
title Prediction and analysis of antifreeze proteins
title_full Prediction and analysis of antifreeze proteins
title_fullStr Prediction and analysis of antifreeze proteins
title_full_unstemmed Prediction and analysis of antifreeze proteins
title_short Prediction and analysis of antifreeze proteins
title_sort prediction and analysis of antifreeze proteins
topic Antifreeze proteins
Prediction
Protein sequences
Amino acids
Random forest
url http://www.sciencedirect.com/science/article/pii/S2405844021020569
work_keys_str_mv AT ryosukemiyata predictionandanalysisofantifreezeproteins
AT yoshitakamoriwaki predictionandanalysisofantifreezeproteins
AT tohruterada predictionandanalysisofantifreezeproteins
AT kentaroshimizu predictionandanalysisofantifreezeproteins