A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction

The reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity rema...

Full description

Bibliographic Details
Main Authors: Dhvani Sandip Vora, Yugesh Verma, Durai Sundar
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Biomolecules
Subjects:
Online Access:https://www.mdpi.com/2218-273X/12/8/1123
_version_ 1827618055377649664
author Dhvani Sandip Vora
Yugesh Verma
Durai Sundar
author_facet Dhvani Sandip Vora
Yugesh Verma
Durai Sundar
author_sort Dhvani Sandip Vora
collection DOAJ
description The reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).
first_indexed 2024-03-09T10:00:14Z
format Article
id doaj.art-3493fa8fdd5242a8b8db8b1b4c805342
institution Directory Open Access Journal
issn 2218-273X
language English
last_indexed 2024-03-09T10:00:14Z
publishDate 2022-08-01
publisher MDPI AG
record_format Article
series Biomolecules
spelling doaj.art-3493fa8fdd5242a8b8db8b1b4c8053422023-12-01T23:29:16ZengMDPI AGBiomolecules2218-273X2022-08-01128112310.3390/biom12081123A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity PredictionDhvani Sandip Vora0Yugesh Verma1Durai Sundar2Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaDepartment of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaDepartment of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaThe reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).https://www.mdpi.com/2218-273X/12/8/1123CRISPR/Cas9genome editingmachine learningSHAP valuesbinding energyoff-targets
spellingShingle Dhvani Sandip Vora
Yugesh Verma
Durai Sundar
A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
Biomolecules
CRISPR/Cas9
genome editing
machine learning
SHAP values
binding energy
off-targets
title A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_full A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_fullStr A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_full_unstemmed A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_short A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_sort machine learning approach to identify the importance of novel features for crispr cas9 activity prediction
topic CRISPR/Cas9
genome editing
machine learning
SHAP values
binding energy
off-targets
url https://www.mdpi.com/2218-273X/12/8/1123
work_keys_str_mv AT dhvanisandipvora amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT yugeshverma amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT duraisundar amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT dhvanisandipvora machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT yugeshverma machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT duraisundar machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction