Prediction of novel mouse TLR9 agonists using a random forest approach

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to...

Full description

Bibliographic Details
Main Authors:	Varun Khanna, Lei Li, Johnson Fung, Shoba Ranganathan, Nikolai Petrovsky
Format:	Article
Language:	English
Published:	BMC 2019-12-01
Series:	BMC Molecular and Cell Biology
Subjects:	Toll-like receptor 9 CpG Machine learning Random Forest Oligonucleotides
Online Access:	https://doi.org/10.1186/s12860-019-0241-0

_version_	1818920928062472192
author	Varun Khanna Lei Li Johnson Fung Shoba Ranganathan Nikolai Petrovsky
author_facet	Varun Khanna Lei Li Johnson Fung Shoba Ranganathan Nikolai Petrovsky
author_sort	Varun Khanna
collection	DOAJ
description	Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.
first_indexed	2024-12-20T01:29:32Z
format	Article
id	doaj.art-a8f9200da25845a583a7250b6831c7a2
institution	Directory Open Access Journal
issn	2661-8850
language	English
last_indexed	2024-12-20T01:29:32Z
publishDate	2019-12-01
publisher	BMC
record_format	Article
series	BMC Molecular and Cell Biology
spelling	doaj.art-a8f9200da25845a583a7250b6831c7a22022-12-21T19:58:08ZengBMCBMC Molecular and Cell Biology2661-88502019-12-0120S211410.1186/s12860-019-0241-0Prediction of novel mouse TLR9 agonists using a random forest approachVarun Khanna0Lei Li1Johnson Fung2Shoba Ranganathan3Nikolai Petrovsky4College of Medicine and Public Health, Flinders UniversityCollege of Medicine and Public Health, Flinders UniversityVaxine Pty LtdDepartment of Molecular Sciences, Macquarie UniversityCollege of Medicine and Public Health, Flinders UniversityAbstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.https://doi.org/10.1186/s12860-019-0241-0Toll-like receptor 9CpGMachine learningRandom ForestCpGOligonucleotides
spellingShingle	Varun Khanna Lei Li Johnson Fung Shoba Ranganathan Nikolai Petrovsky Prediction of novel mouse TLR9 agonists using a random forest approach BMC Molecular and Cell Biology Toll-like receptor 9 CpG Machine learning Random Forest CpG Oligonucleotides
title	Prediction of novel mouse TLR9 agonists using a random forest approach
title_full	Prediction of novel mouse TLR9 agonists using a random forest approach
title_fullStr	Prediction of novel mouse TLR9 agonists using a random forest approach
title_full_unstemmed	Prediction of novel mouse TLR9 agonists using a random forest approach
title_short	Prediction of novel mouse TLR9 agonists using a random forest approach
title_sort	prediction of novel mouse tlr9 agonists using a random forest approach
topic	Toll-like receptor 9 CpG Machine learning Random Forest CpG Oligonucleotides
url	https://doi.org/10.1186/s12860-019-0241-0
work_keys_str_mv	AT varunkhanna predictionofnovelmousetlr9agonistsusingarandomforestapproach AT leili predictionofnovelmousetlr9agonistsusingarandomforestapproach AT johnsonfung predictionofnovelmousetlr9agonistsusingarandomforestapproach AT shobaranganathan predictionofnovelmousetlr9agonistsusingarandomforestapproach AT nikolaipetrovsky predictionofnovelmousetlr9agonistsusingarandomforestapproach

Prediction of novel mouse TLR9 agonists using a random forest approach

Similar Items