Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for pr...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2019-09-01
|
Series: | Molecular Therapy: Nucleic Acids |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2162253119301672 |
_version_ | 1819182836346781696 |
---|---|
author | Yumeng Liu Shengyu Chen Xiaolong Wang Bin Liu |
author_facet | Yumeng Liu Shengyu Chen Xiaolong Wang Bin Liu |
author_sort | Yumeng Liu |
collection | DOAJ |
description | Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP. Keywords: intrinsically disordered proteins/regions, ensemble predictor, length-dependent predictors, conditional random fields, CRFs |
first_indexed | 2024-12-22T22:52:27Z |
format | Article |
id | doaj.art-0144a629eede4b6aa426f230ea9234cd |
institution | Directory Open Access Journal |
issn | 2162-2531 |
language | English |
last_indexed | 2024-12-22T22:52:27Z |
publishDate | 2019-09-01 |
publisher | Elsevier |
record_format | Article |
series | Molecular Therapy: Nucleic Acids |
spelling | doaj.art-0144a629eede4b6aa426f230ea9234cd2022-12-21T18:09:54ZengElsevierMolecular Therapy: Nucleic Acids2162-25312019-09-0117396404Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random FieldsYumeng Liu0Shengyu Chen1Xiaolong Wang2Bin Liu3School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, ChinaSchool of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN 47408, USASchool of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China; Corresponding author: Xiaolong Wang, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China; Corresponding author: Bin Liu, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP. Keywords: intrinsically disordered proteins/regions, ensemble predictor, length-dependent predictors, conditional random fields, CRFshttp://www.sciencedirect.com/science/article/pii/S2162253119301672 |
spellingShingle | Yumeng Liu Shengyu Chen Xiaolong Wang Bin Liu Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields Molecular Therapy: Nucleic Acids |
title | Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields |
title_full | Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields |
title_fullStr | Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields |
title_full_unstemmed | Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields |
title_short | Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields |
title_sort | identification of intrinsically disordered proteins and regions by length dependent predictors based on conditional random fields |
url | http://www.sciencedirect.com/science/article/pii/S2162253119301672 |
work_keys_str_mv | AT yumengliu identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields AT shengyuchen identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields AT xiaolongwang identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields AT binliu identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields |