Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
Abstract Background DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSB...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-06-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-017-1715-8 |
_version_ | 1829497553978130432 |
---|---|
author | Wei Wang Lin Sun Shiguang Zhang Hongjun Zhang Jinling Shi Tianhe Xu Keliang Li |
author_facet | Wei Wang Lin Sun Shiguang Zhang Hongjun Zhang Jinling Shi Tianhe Xu Keliang Li |
author_sort | Wei Wang |
collection | DOAJ |
description | Abstract Background DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins. |
first_indexed | 2024-12-16T07:53:48Z |
format | Article |
id | doaj.art-5793c1c2dfee4f2eba22861f902951c6 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-16T07:53:48Z |
publishDate | 2017-06-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-5793c1c2dfee4f2eba22861f902951c62022-12-21T22:38:47ZengBMCBMC Bioinformatics1471-21052017-06-0118111010.1186/s12859-017-1715-8Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequencesWei Wang0Lin Sun1Shiguang Zhang2Hongjun Zhang3Jinling Shi4Tianhe Xu5Keliang Li6College of Computer and Information Engineering, Henan Normal UniversityCollege of Computer and Information Engineering, Henan Normal UniversityCollege of Computer and Information Engineering, Henan Normal UniversitySchool of Aviation Engineering, Anyang UniversitySchool of International Education, Xuchang UniversityCollege of Computer and Information Engineering, Henan Normal UniversityCollege of Computer and Information Engineering, Henan Normal UniversityAbstract Background DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.http://link.springer.com/article/10.1186/s12859-017-1715-8SSBs (Single-stranded DNA-binding proteins)DSBs (Double-stranded DNA-binding proteins)Binding specificityProtein sequence |
spellingShingle | Wei Wang Lin Sun Shiguang Zhang Hongjun Zhang Jinling Shi Tianhe Xu Keliang Li Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences BMC Bioinformatics SSBs (Single-stranded DNA-binding proteins) DSBs (Double-stranded DNA-binding proteins) Binding specificity Protein sequence |
title | Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences |
title_full | Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences |
title_fullStr | Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences |
title_full_unstemmed | Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences |
title_short | Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences |
title_sort | analysis and prediction of single stranded and double stranded dna binding proteins based on protein sequences |
topic | SSBs (Single-stranded DNA-binding proteins) DSBs (Double-stranded DNA-binding proteins) Binding specificity Protein sequence |
url | http://link.springer.com/article/10.1186/s12859-017-1715-8 |
work_keys_str_mv | AT weiwang analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences AT linsun analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences AT shiguangzhang analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences AT hongjunzhang analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences AT jinlingshi analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences AT tianhexu analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences AT keliangli analysisandpredictionofsinglestrandedanddoublestrandeddnabindingproteinsbasedonproteinsequences |