Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

Abstract Background DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSB...

Full description

Bibliographic Details
Main Authors: Wei Wang, Lin Sun, Shiguang Zhang, Hongjun Zhang, Jinling Shi, Tianhe Xu, Keliang Li
Format: Article
Language:English
Published: BMC 2017-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1715-8
_version_ 1829497553978130432
author Wei Wang
Lin Sun
Shiguang Zhang
Hongjun Zhang
Jinling Shi
Tianhe Xu
Keliang Li
author_facet Wei Wang
Lin Sun
Shiguang Zhang
Hongjun Zhang
Jinling Shi
Tianhe Xu
Keliang Li
author_sort Wei Wang
collection DOAJ
description Abstract Background DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.
first_indexed 2024-12-16T07:53:48Z
format Article
id doaj.art-5793c1c2dfee4f2eba22861f902951c6
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-16T07:53:48Z
publishDate 2017-06-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-5793c1c2dfee4f2eba22861f902951c62022-12-21T22:38:47ZengBMCBMC Bioinformatics1471-21052017-06-0118111010.1186/s12859-017-1715-8Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequencesWei Wang0Lin Sun1Shiguang Zhang2Hongjun Zhang3Jinling Shi4Tianhe Xu5Keliang Li6College of Computer and Information Engineering, Henan Normal UniversityCollege of Computer and Information Engineering, Henan Normal UniversityCollege of Computer and Information Engineering, Henan Normal UniversitySchool of Aviation Engineering, Anyang UniversitySchool of International Education, Xuchang UniversityCollege of Computer and Information Engineering, Henan Normal UniversityCollege of Computer and Information Engineering, Henan Normal UniversityAbstract Background DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.http://link.springer.com/article/10.1186/s12859-017-1715-8SSBs (Single-stranded DNA-binding proteins)DSBs (Double-stranded DNA-binding proteins)Binding specificityProtein sequence
spellingShingle Wei Wang
Lin Sun
Shiguang Zhang
Hongjun Zhang
Jinling Shi
Tianhe Xu
Keliang Li
Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
BMC Bioinformatics
SSBs (Single-stranded DNA-binding proteins)
DSBs (Double-stranded DNA-binding proteins)
Binding specificity
Protein sequence
title Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
title_full Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
title_fullStr Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
title_full_unstemmed Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
title_short Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
title_sort analysis and prediction of single stranded and double stranded dna binding proteins based on protein sequences
topic SSBs (Single-stranded DNA-binding proteins)
DSBs (Double-stranded DNA-binding proteins)
Binding specificity
Protein sequence
url http://link.springer.com/article/10.1186/s12859-017-1715-8
work_keys_str_mv AT weiwang analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences
AT linsun analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences
AT shiguangzhang analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences
AT hongjunzhang analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences
AT jinlingshi analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences
AT tianhexu analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences
AT keliangli analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences