ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence

Extracellular matrix (ECM) proteins play an essential role in various biological processes in multicellular organisms, and their abnormal regulation can lead to many diseases. For large-scale ECM protein identification, especially through proteomic-based techniques, a theoretical reference database...

Full description

Bibliographic Details
Main Authors: Binghui Liu, Ling Leng, Xuer Sun, Yunfang Wang, Jie Ma, Yunping Zhu
Format: Article
Language:English
Published: PeerJ Inc. 2020-04-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/9066.pdf
_version_ 1797418265365970944
author Binghui Liu
Ling Leng
Xuer Sun
Yunfang Wang
Jie Ma
Yunping Zhu
author_facet Binghui Liu
Ling Leng
Xuer Sun
Yunfang Wang
Jie Ma
Yunping Zhu
author_sort Binghui Liu
collection DOAJ
description Extracellular matrix (ECM) proteins play an essential role in various biological processes in multicellular organisms, and their abnormal regulation can lead to many diseases. For large-scale ECM protein identification, especially through proteomic-based techniques, a theoretical reference database of ECM proteins is required. In this study, based on the experimentally verified ECM datasets and by the integration of protein domain features and a machine learning model, we developed ECMPride, a flexible and scalable tool for predicting ECM proteins. ECMPride achieved excellent performance in predicting ECM proteins, with appropriate balanced accuracy and sensitivity, and the performance of ECMPride was shown to be superior to the previously developed tool. A new theoretical dataset of human ECM components was also established by applying ECMPride to all human entries in the SwissProt database, containing a significant number of putative ECM proteins as well as the abundant biological annotations. This dataset might serve as a valuable reference resource for ECM protein identification.
first_indexed 2024-03-09T06:30:09Z
format Article
id doaj.art-951e8a408e764f71a79cac7b3beece59
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:30:09Z
publishDate 2020-04-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-951e8a408e764f71a79cac7b3beece592023-12-03T11:06:06ZengPeerJ Inc.PeerJ2167-83592020-04-018e906610.7717/peerj.9066ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidenceBinghui Liu0Ling Leng1Xuer Sun2Yunfang Wang3Jie Ma4Yunping Zhu5State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, ChinaDepartment of Central Laboratory, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, ChinaTissue Engineering Lab, Institute of Health Service and Transfusion Medicine, Beijing, ChinaTissue Engineering Lab, Institute of Health Service and Transfusion Medicine, Beijing, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, ChinaExtracellular matrix (ECM) proteins play an essential role in various biological processes in multicellular organisms, and their abnormal regulation can lead to many diseases. For large-scale ECM protein identification, especially through proteomic-based techniques, a theoretical reference database of ECM proteins is required. In this study, based on the experimentally verified ECM datasets and by the integration of protein domain features and a machine learning model, we developed ECMPride, a flexible and scalable tool for predicting ECM proteins. ECMPride achieved excellent performance in predicting ECM proteins, with appropriate balanced accuracy and sensitivity, and the performance of ECMPride was shown to be superior to the previously developed tool. A new theoretical dataset of human ECM components was also established by applying ECMPride to all human entries in the SwissProt database, containing a significant number of putative ECM proteins as well as the abundant biological annotations. This dataset might serve as a valuable reference resource for ECM protein identification.https://peerj.com/articles/9066.pdfExtracellular matrix proteinsProteomicsPrediction toolRandom forestUnder-sampling ensemble method
spellingShingle Binghui Liu
Ling Leng
Xuer Sun
Yunfang Wang
Jie Ma
Yunping Zhu
ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence
PeerJ
Extracellular matrix proteins
Proteomics
Prediction tool
Random forest
Under-sampling ensemble method
title ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence
title_full ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence
title_fullStr ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence
title_full_unstemmed ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence
title_short ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence
title_sort ecmpride prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence
topic Extracellular matrix proteins
Proteomics
Prediction tool
Random forest
Under-sampling ensemble method
url https://peerj.com/articles/9066.pdf
work_keys_str_mv AT binghuiliu ecmpridepredictionofhumanextracellularmatrixproteinsbasedontheidealdatasetusinghybridfeatureswithdomainevidence
AT lingleng ecmpridepredictionofhumanextracellularmatrixproteinsbasedontheidealdatasetusinghybridfeatureswithdomainevidence
AT xuersun ecmpridepredictionofhumanextracellularmatrixproteinsbasedontheidealdatasetusinghybridfeatureswithdomainevidence
AT yunfangwang ecmpridepredictionofhumanextracellularmatrixproteinsbasedontheidealdatasetusinghybridfeatureswithdomainevidence
AT jiema ecmpridepredictionofhumanextracellularmatrixproteinsbasedontheidealdatasetusinghybridfeatureswithdomainevidence
AT yunpingzhu ecmpridepredictionofhumanextracellularmatrixproteinsbasedontheidealdatasetusinghybridfeatureswithdomainevidence