Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas

The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recrea...

Full description

Bibliographic Details
Main Authors: Mohsen Hesami, Mohsen Yoosefzadeh Najafabadi, Kristian Adamek, Davoud Torkamaneh, Andrew Maxwell Phineas Jones
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/26/7/2053
_version_ 1797538916152115200
author Mohsen Hesami
Mohsen Yoosefzadeh Najafabadi
Kristian Adamek
Davoud Torkamaneh
Andrew Maxwell Phineas Jones
author_facet Mohsen Hesami
Mohsen Yoosefzadeh Najafabadi
Kristian Adamek
Davoud Torkamaneh
Andrew Maxwell Phineas Jones
author_sort Mohsen Hesami
collection DOAJ
description The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recreational, and medicinal plant. However, the CRISPR system requires the design of precise (on-target) single-guide RNA (sgRNA). Therefore, it is essential to predict off-target activity of the designed sgRNAs to avoid unexpected outcomes. The current study is aimed to assess the predictive ability of three machine learning (ML) algorithms (radial basis function (RBF), support vector machine (SVM), and random forest (RF)) alongside the ensemble-bagging (E-B) strategy by synergizing MIT and cutting frequency determination (CFD) scores to predict sgRNA off-target activity through in silico targeting a histone H3-like centromeric protein, HTR12, in cannabis. The RF algorithm exhibited the highest precision, recall, and F-measure compared to all the tested individual algorithms with values of 0.61, 0.64, and 0.62, respectively. We then used the RF algorithm as a meta-classifier for the E-B method, which led to an increased precision with an F-measure of 0.62 and 0.66, respectively. The E-B algorithm had the highest area under the precision recall curves (AUC-PRC; 0.74) and area under the receiver operating characteristic (ROC) curves (AUC-ROC; 0.71), displaying the success of using E-B as one of the common ensemble strategies. This study constitutes a foundational resource of utilizing ML models to predict gRNA off-target activities in cannabis.
first_indexed 2024-03-10T12:37:57Z
format Article
id doaj.art-bce0a4077ac1472391f34560b6363a2c
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-10T12:37:57Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-bce0a4077ac1472391f34560b6363a2c2023-11-21T14:07:31ZengMDPI AGMolecules1420-30492021-04-01267205310.3390/molecules26072053Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/CasMohsen Hesami0Mohsen Yoosefzadeh Najafabadi1Kristian Adamek2Davoud Torkamaneh3Andrew Maxwell Phineas Jones4Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaThe clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recreational, and medicinal plant. However, the CRISPR system requires the design of precise (on-target) single-guide RNA (sgRNA). Therefore, it is essential to predict off-target activity of the designed sgRNAs to avoid unexpected outcomes. The current study is aimed to assess the predictive ability of three machine learning (ML) algorithms (radial basis function (RBF), support vector machine (SVM), and random forest (RF)) alongside the ensemble-bagging (E-B) strategy by synergizing MIT and cutting frequency determination (CFD) scores to predict sgRNA off-target activity through in silico targeting a histone H3-like centromeric protein, HTR12, in cannabis. The RF algorithm exhibited the highest precision, recall, and F-measure compared to all the tested individual algorithms with values of 0.61, 0.64, and 0.62, respectively. We then used the RF algorithm as a meta-classifier for the E-B method, which led to an increased precision with an F-measure of 0.62 and 0.66, respectively. The E-B algorithm had the highest area under the precision recall curves (AUC-PRC; 0.74) and area under the receiver operating characteristic (ROC) curves (AUC-ROC; 0.71), displaying the success of using E-B as one of the common ensemble strategies. This study constitutes a foundational resource of utilizing ML models to predict gRNA off-target activities in cannabis.https://www.mdpi.com/1420-3049/26/7/2053hempmarijuanamachine learning algorithmensemble modelCENH3sgRNA
spellingShingle Mohsen Hesami
Mohsen Yoosefzadeh Najafabadi
Kristian Adamek
Davoud Torkamaneh
Andrew Maxwell Phineas Jones
Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas
Molecules
hemp
marijuana
machine learning algorithm
ensemble model
CENH3
sgRNA
title Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas
title_full Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas
title_fullStr Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas
title_full_unstemmed Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas
title_short Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas
title_sort synergizing off target predictions for in silico insights of cenh3 knockout in cannabis through crispr cas
topic hemp
marijuana
machine learning algorithm
ensemble model
CENH3
sgRNA
url https://www.mdpi.com/1420-3049/26/7/2053
work_keys_str_mv AT mohsenhesami synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas
AT mohsenyoosefzadehnajafabadi synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas
AT kristianadamek synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas
AT davoudtorkamaneh synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas
AT andrewmaxwellphineasjones synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas