A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
Protein-protein interaction (PPIs) is an important part of many life activities in organisms, and the prediction of protein-protein interactions is closely related to protein function, disease occurrence, and disease treatment. In order to optimize the prediction performance of protein interactions,...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2022-01-01
|
Series: | BIO Web of Conferences |
Subjects: | |
Online Access: | https://www.bio-conferences.org/articles/bioconf/pdf/2022/14/bioconf_fbse2022_01017.pdf |
_version_ | 1811178285915176960 |
---|---|
author | Wang Kenan Zhao Xiaoman Wang Xue |
author_facet | Wang Kenan Zhao Xiaoman Wang Xue |
author_sort | Wang Kenan |
collection | DOAJ |
description | Protein-protein interaction (PPIs) is an important part of many life activities in organisms, and the prediction of protein-protein interactions is closely related to protein function, disease occurrence, and disease treatment. In order to optimize the prediction performance of protein interactions, here a RT-MOS model was constructed based on Random Forest (RF) and Matrix of Sequence (MOS) to predict protein-protein interactions. Firstly, MOS is used to encode the protein sequences into a 29-dimensional feature vector; Then, a prediction model RT-MOS is build based on random forest, and the RT-MOS model is optimized and evaluated using the test set; Finally, the optimized model RT-MOS is used for prediction. The experimental results show that the accuracy rates of the RT-MOS model on the benchmark dataset and the non-redundant dataset are 97.18% and 91.34%, respectively, and the accuracies on four external datasets of C.elegans, Drosophila, E.coli and H.sapiens are 96.21%, 97.86%, 97.54% and 97.75%, respectively. Compared with the existing methods, it is found that it is superior to the existing methods. The experimental results show that the model RT-MOS has the advantages of saving time, preventing overfitting and high accuracy, and is suitable for large-scale PPIs prediction. |
first_indexed | 2024-04-11T06:15:54Z |
format | Article |
id | doaj.art-03e68767900c4f379543afa5c00c11a5 |
institution | Directory Open Access Journal |
issn | 2117-4458 |
language | English |
last_indexed | 2024-04-11T06:15:54Z |
publishDate | 2022-01-01 |
publisher | EDP Sciences |
record_format | Article |
series | BIO Web of Conferences |
spelling | doaj.art-03e68767900c4f379543afa5c00c11a52022-12-22T04:41:02ZengEDP SciencesBIO Web of Conferences2117-44582022-01-01550101710.1051/bioconf/20225501017bioconf_fbse2022_01017A large-scale prediction of protein-protein interactions based on random forest and matrix of sequenceWang Kenan0Zhao Xiaoman1Wang Xue2University College London, Institute of Child HealthInstitute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of SciencesInstitute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of SciencesProtein-protein interaction (PPIs) is an important part of many life activities in organisms, and the prediction of protein-protein interactions is closely related to protein function, disease occurrence, and disease treatment. In order to optimize the prediction performance of protein interactions, here a RT-MOS model was constructed based on Random Forest (RF) and Matrix of Sequence (MOS) to predict protein-protein interactions. Firstly, MOS is used to encode the protein sequences into a 29-dimensional feature vector; Then, a prediction model RT-MOS is build based on random forest, and the RT-MOS model is optimized and evaluated using the test set; Finally, the optimized model RT-MOS is used for prediction. The experimental results show that the accuracy rates of the RT-MOS model on the benchmark dataset and the non-redundant dataset are 97.18% and 91.34%, respectively, and the accuracies on four external datasets of C.elegans, Drosophila, E.coli and H.sapiens are 96.21%, 97.86%, 97.54% and 97.75%, respectively. Compared with the existing methods, it is found that it is superior to the existing methods. The experimental results show that the model RT-MOS has the advantages of saving time, preventing overfitting and high accuracy, and is suitable for large-scale PPIs prediction.https://www.bio-conferences.org/articles/bioconf/pdf/2022/14/bioconf_fbse2022_01017.pdfrandom forestmatrix of sequenceprotein-protein interaction |
spellingShingle | Wang Kenan Zhao Xiaoman Wang Xue A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence BIO Web of Conferences random forest matrix of sequence protein-protein interaction |
title | A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence |
title_full | A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence |
title_fullStr | A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence |
title_full_unstemmed | A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence |
title_short | A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence |
title_sort | large scale prediction of protein protein interactions based on random forest and matrix of sequence |
topic | random forest matrix of sequence protein-protein interaction |
url | https://www.bio-conferences.org/articles/bioconf/pdf/2022/14/bioconf_fbse2022_01017.pdf |
work_keys_str_mv | AT wangkenan alargescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence AT zhaoxiaoman alargescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence AT wangxue alargescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence AT wangkenan largescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence AT zhaoxiaoman largescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence AT wangxue largescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence |