RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human

The identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HC...

Full description

Bibliographic Details
Main Authors: Xin Liu, Yaping Lu, Liang Wang, Wei Geng, Xinyi Shi, Xiao Zhang
Format: Article
Language:English
Published: Tsinghua University Press 2023-03-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2022.9020031
_version_ 1811296780805996544
author Xin Liu
Yaping Lu
Liang Wang
Wei Geng
Xinyi Shi
Xiao Zhang
author_facet Xin Liu
Yaping Lu
Liang Wang
Wei Geng
Xinyi Shi
Xiao Zhang
author_sort Xin Liu
collection DOAJ
description The identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases, facilitating studies based on computational methods. In this study, we proposed a new computational approach, rotation forest position-specific scoring matrix (RF-PSSM), to predict the interactions among HCV and human proteins. In particular, PSSM was used to characterize each protein, two-dimensional principal component analysis (2DPCA) was then adopted for feature extraction of PSSM. Finally, rotation forest (RF) was used to implement classification. The results of various ablation experiments show that on independent datasets, the accuracy and area under curve (AUC) value of RF-PSSM can reach 93.74% and 94.29%, respectively, outperforming almost all cutting-edge research. In addition, we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1, which can provide theoretical guidance for future experimental studies.
first_indexed 2024-04-13T05:53:35Z
format Article
id doaj.art-eced14462cea4726bfda01e5cfc65ff2
institution Directory Open Access Journal
issn 2096-0654
language English
last_indexed 2024-04-13T05:53:35Z
publishDate 2023-03-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj.art-eced14462cea4726bfda01e5cfc65ff22022-12-22T02:59:41ZengTsinghua University PressBig Data Mining and Analytics2096-06542023-03-0161213110.26599/BDMA.2022.9020031RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and HumanXin Liu0Yaping Lu1Liang Wang2Wei Geng3Xinyi Shi4Xiao Zhang5School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou 221000, ChinaCollege of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, ChinaLaboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, ChinaSchool of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou 221000, ChinaHangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310005, ChinaSchool of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou 221000, ChinaThe identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases, facilitating studies based on computational methods. In this study, we proposed a new computational approach, rotation forest position-specific scoring matrix (RF-PSSM), to predict the interactions among HCV and human proteins. In particular, PSSM was used to characterize each protein, two-dimensional principal component analysis (2DPCA) was then adopted for feature extraction of PSSM. Finally, rotation forest (RF) was used to implement classification. The results of various ablation experiments show that on independent datasets, the accuracy and area under curve (AUC) value of RF-PSSM can reach 93.74% and 94.29%, respectively, outperforming almost all cutting-edge research. In addition, we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1, which can provide theoretical guidance for future experimental studies.https://www.sciopen.com/article/10.26599/BDMA.2022.9020031protein-protein interactionshepatitis c virusposition specific scoring matrixtwo-dimensional principal component analysisrotation forest
spellingShingle Xin Liu
Yaping Lu
Liang Wang
Wei Geng
Xinyi Shi
Xiao Zhang
RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human
Big Data Mining and Analytics
protein-protein interactions
hepatitis c virus
position specific scoring matrix
two-dimensional principal component analysis
rotation forest
title RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human
title_full RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human
title_fullStr RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human
title_full_unstemmed RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human
title_short RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human
title_sort rf pssm a combination of rotation forest algorithm and position specific scoring matrix for improved prediction of protein protein interactions between hepatitis c virus and human
topic protein-protein interactions
hepatitis c virus
position specific scoring matrix
two-dimensional principal component analysis
rotation forest
url https://www.sciopen.com/article/10.26599/BDMA.2022.9020031
work_keys_str_mv AT xinliu rfpssmacombinationofrotationforestalgorithmandpositionspecificscoringmatrixforimprovedpredictionofproteinproteininteractionsbetweenhepatitiscvirusandhuman
AT yapinglu rfpssmacombinationofrotationforestalgorithmandpositionspecificscoringmatrixforimprovedpredictionofproteinproteininteractionsbetweenhepatitiscvirusandhuman
AT liangwang rfpssmacombinationofrotationforestalgorithmandpositionspecificscoringmatrixforimprovedpredictionofproteinproteininteractionsbetweenhepatitiscvirusandhuman
AT weigeng rfpssmacombinationofrotationforestalgorithmandpositionspecificscoringmatrixforimprovedpredictionofproteinproteininteractionsbetweenhepatitiscvirusandhuman
AT xinyishi rfpssmacombinationofrotationforestalgorithmandpositionspecificscoringmatrixforimprovedpredictionofproteinproteininteractionsbetweenhepatitiscvirusandhuman
AT xiaozhang rfpssmacombinationofrotationforestalgorithmandpositionspecificscoringmatrixforimprovedpredictionofproteinproteininteractionsbetweenhepatitiscvirusandhuman