A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence

Protein-protein interaction (PPIs) is an important part of many life activities in organisms, and the prediction of protein-protein interactions is closely related to protein function, disease occurrence, and disease treatment. In order to optimize the prediction performance of protein interactions,...

Full description

Bibliographic Details
Main Authors: Wang Kenan, Zhao Xiaoman, Wang Xue
Format: Article
Language:English
Published: EDP Sciences 2022-01-01
Series:BIO Web of Conferences
Subjects:
Online Access:https://www.bio-conferences.org/articles/bioconf/pdf/2022/14/bioconf_fbse2022_01017.pdf
_version_ 1811178285915176960
author Wang Kenan
Zhao Xiaoman
Wang Xue
author_facet Wang Kenan
Zhao Xiaoman
Wang Xue
author_sort Wang Kenan
collection DOAJ
description Protein-protein interaction (PPIs) is an important part of many life activities in organisms, and the prediction of protein-protein interactions is closely related to protein function, disease occurrence, and disease treatment. In order to optimize the prediction performance of protein interactions, here a RT-MOS model was constructed based on Random Forest (RF) and Matrix of Sequence (MOS) to predict protein-protein interactions. Firstly, MOS is used to encode the protein sequences into a 29-dimensional feature vector; Then, a prediction model RT-MOS is build based on random forest, and the RT-MOS model is optimized and evaluated using the test set; Finally, the optimized model RT-MOS is used for prediction. The experimental results show that the accuracy rates of the RT-MOS model on the benchmark dataset and the non-redundant dataset are 97.18% and 91.34%, respectively, and the accuracies on four external datasets of C.elegans, Drosophila, E.coli and H.sapiens are 96.21%, 97.86%, 97.54% and 97.75%, respectively. Compared with the existing methods, it is found that it is superior to the existing methods. The experimental results show that the model RT-MOS has the advantages of saving time, preventing overfitting and high accuracy, and is suitable for large-scale PPIs prediction.
first_indexed 2024-04-11T06:15:54Z
format Article
id doaj.art-03e68767900c4f379543afa5c00c11a5
institution Directory Open Access Journal
issn 2117-4458
language English
last_indexed 2024-04-11T06:15:54Z
publishDate 2022-01-01
publisher EDP Sciences
record_format Article
series BIO Web of Conferences
spelling doaj.art-03e68767900c4f379543afa5c00c11a52022-12-22T04:41:02ZengEDP SciencesBIO Web of Conferences2117-44582022-01-01550101710.1051/bioconf/20225501017bioconf_fbse2022_01017A large-scale prediction of protein-protein interactions based on random forest and matrix of sequenceWang Kenan0Zhao Xiaoman1Wang Xue2University College London, Institute of Child HealthInstitute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of SciencesInstitute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of SciencesProtein-protein interaction (PPIs) is an important part of many life activities in organisms, and the prediction of protein-protein interactions is closely related to protein function, disease occurrence, and disease treatment. In order to optimize the prediction performance of protein interactions, here a RT-MOS model was constructed based on Random Forest (RF) and Matrix of Sequence (MOS) to predict protein-protein interactions. Firstly, MOS is used to encode the protein sequences into a 29-dimensional feature vector; Then, a prediction model RT-MOS is build based on random forest, and the RT-MOS model is optimized and evaluated using the test set; Finally, the optimized model RT-MOS is used for prediction. The experimental results show that the accuracy rates of the RT-MOS model on the benchmark dataset and the non-redundant dataset are 97.18% and 91.34%, respectively, and the accuracies on four external datasets of C.elegans, Drosophila, E.coli and H.sapiens are 96.21%, 97.86%, 97.54% and 97.75%, respectively. Compared with the existing methods, it is found that it is superior to the existing methods. The experimental results show that the model RT-MOS has the advantages of saving time, preventing overfitting and high accuracy, and is suitable for large-scale PPIs prediction.https://www.bio-conferences.org/articles/bioconf/pdf/2022/14/bioconf_fbse2022_01017.pdfrandom forestmatrix of sequenceprotein-protein interaction
spellingShingle Wang Kenan
Zhao Xiaoman
Wang Xue
A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
BIO Web of Conferences
random forest
matrix of sequence
protein-protein interaction
title A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
title_full A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
title_fullStr A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
title_full_unstemmed A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
title_short A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
title_sort large scale prediction of protein protein interactions based on random forest and matrix of sequence
topic random forest
matrix of sequence
protein-protein interaction
url https://www.bio-conferences.org/articles/bioconf/pdf/2022/14/bioconf_fbse2022_01017.pdf
work_keys_str_mv AT wangkenan alargescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence
AT zhaoxiaoman alargescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence
AT wangxue alargescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence
AT wangkenan largescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence
AT zhaoxiaoman largescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence
AT wangxue largescalepredictionofproteinproteininteractionsbasedonrandomforestandmatrixofsequence