Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields

Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for pr...

Full description

Bibliographic Details
Main Authors: Yumeng Liu, Shengyu Chen, Xiaolong Wang, Bin Liu
Format: Article
Language:English
Published: Elsevier 2019-09-01
Series:Molecular Therapy: Nucleic Acids
Online Access:http://www.sciencedirect.com/science/article/pii/S2162253119301672
_version_ 1819182836346781696
author Yumeng Liu
Shengyu Chen
Xiaolong Wang
Bin Liu
author_facet Yumeng Liu
Shengyu Chen
Xiaolong Wang
Bin Liu
author_sort Yumeng Liu
collection DOAJ
description Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP. Keywords: intrinsically disordered proteins/regions, ensemble predictor, length-dependent predictors, conditional random fields, CRFs
first_indexed 2024-12-22T22:52:27Z
format Article
id doaj.art-0144a629eede4b6aa426f230ea9234cd
institution Directory Open Access Journal
issn 2162-2531
language English
last_indexed 2024-12-22T22:52:27Z
publishDate 2019-09-01
publisher Elsevier
record_format Article
series Molecular Therapy: Nucleic Acids
spelling doaj.art-0144a629eede4b6aa426f230ea9234cd2022-12-21T18:09:54ZengElsevierMolecular Therapy: Nucleic Acids2162-25312019-09-0117396404Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random FieldsYumeng Liu0Shengyu Chen1Xiaolong Wang2Bin Liu3School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, ChinaSchool of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN 47408, USASchool of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China; Corresponding author: Xiaolong Wang, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China; Corresponding author: Bin Liu, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP. Keywords: intrinsically disordered proteins/regions, ensemble predictor, length-dependent predictors, conditional random fields, CRFshttp://www.sciencedirect.com/science/article/pii/S2162253119301672
spellingShingle Yumeng Liu
Shengyu Chen
Xiaolong Wang
Bin Liu
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields
Molecular Therapy: Nucleic Acids
title Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields
title_full Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields
title_fullStr Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields
title_full_unstemmed Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields
title_short Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields
title_sort identification of intrinsically disordered proteins and regions by length dependent predictors based on conditional random fields
url http://www.sciencedirect.com/science/article/pii/S2162253119301672
work_keys_str_mv AT yumengliu identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields
AT shengyuchen identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields
AT xiaolongwang identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields
AT binliu identificationofintrinsicallydisorderedproteinsandregionsbylengthdependentpredictorsbasedonconditionalrandomfields