A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
The reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity rema...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-08-01
|
Series: | Biomolecules |
Subjects: | |
Online Access: | https://www.mdpi.com/2218-273X/12/8/1123 |
_version_ | 1827618055377649664 |
---|---|
author | Dhvani Sandip Vora Yugesh Verma Durai Sundar |
author_facet | Dhvani Sandip Vora Yugesh Verma Durai Sundar |
author_sort | Dhvani Sandip Vora |
collection | DOAJ |
description | The reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA). |
first_indexed | 2024-03-09T10:00:14Z |
format | Article |
id | doaj.art-3493fa8fdd5242a8b8db8b1b4c805342 |
institution | Directory Open Access Journal |
issn | 2218-273X |
language | English |
last_indexed | 2024-03-09T10:00:14Z |
publishDate | 2022-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Biomolecules |
spelling | doaj.art-3493fa8fdd5242a8b8db8b1b4c8053422023-12-01T23:29:16ZengMDPI AGBiomolecules2218-273X2022-08-01128112310.3390/biom12081123A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity PredictionDhvani Sandip Vora0Yugesh Verma1Durai Sundar2Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaDepartment of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaDepartment of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaThe reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).https://www.mdpi.com/2218-273X/12/8/1123CRISPR/Cas9genome editingmachine learningSHAP valuesbinding energyoff-targets |
spellingShingle | Dhvani Sandip Vora Yugesh Verma Durai Sundar A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction Biomolecules CRISPR/Cas9 genome editing machine learning SHAP values binding energy off-targets |
title | A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction |
title_full | A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction |
title_fullStr | A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction |
title_full_unstemmed | A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction |
title_short | A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction |
title_sort | machine learning approach to identify the importance of novel features for crispr cas9 activity prediction |
topic | CRISPR/Cas9 genome editing machine learning SHAP values binding energy off-targets |
url | https://www.mdpi.com/2218-273X/12/8/1123 |
work_keys_str_mv | AT dhvanisandipvora amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction AT yugeshverma amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction AT duraisundar amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction AT dhvanisandipvora machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction AT yugeshverma machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction AT duraisundar machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction |