Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas
The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recrea...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-04-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/26/7/2053 |
_version_ | 1797538916152115200 |
---|---|
author | Mohsen Hesami Mohsen Yoosefzadeh Najafabadi Kristian Adamek Davoud Torkamaneh Andrew Maxwell Phineas Jones |
author_facet | Mohsen Hesami Mohsen Yoosefzadeh Najafabadi Kristian Adamek Davoud Torkamaneh Andrew Maxwell Phineas Jones |
author_sort | Mohsen Hesami |
collection | DOAJ |
description | The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recreational, and medicinal plant. However, the CRISPR system requires the design of precise (on-target) single-guide RNA (sgRNA). Therefore, it is essential to predict off-target activity of the designed sgRNAs to avoid unexpected outcomes. The current study is aimed to assess the predictive ability of three machine learning (ML) algorithms (radial basis function (RBF), support vector machine (SVM), and random forest (RF)) alongside the ensemble-bagging (E-B) strategy by synergizing MIT and cutting frequency determination (CFD) scores to predict sgRNA off-target activity through in silico targeting a histone H3-like centromeric protein, HTR12, in cannabis. The RF algorithm exhibited the highest precision, recall, and F-measure compared to all the tested individual algorithms with values of 0.61, 0.64, and 0.62, respectively. We then used the RF algorithm as a meta-classifier for the E-B method, which led to an increased precision with an F-measure of 0.62 and 0.66, respectively. The E-B algorithm had the highest area under the precision recall curves (AUC-PRC; 0.74) and area under the receiver operating characteristic (ROC) curves (AUC-ROC; 0.71), displaying the success of using E-B as one of the common ensemble strategies. This study constitutes a foundational resource of utilizing ML models to predict gRNA off-target activities in cannabis. |
first_indexed | 2024-03-10T12:37:57Z |
format | Article |
id | doaj.art-bce0a4077ac1472391f34560b6363a2c |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-10T12:37:57Z |
publishDate | 2021-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-bce0a4077ac1472391f34560b6363a2c2023-11-21T14:07:31ZengMDPI AGMolecules1420-30492021-04-01267205310.3390/molecules26072053Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/CasMohsen Hesami0Mohsen Yoosefzadeh Najafabadi1Kristian Adamek2Davoud Torkamaneh3Andrew Maxwell Phineas Jones4Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, CanadaThe clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recreational, and medicinal plant. However, the CRISPR system requires the design of precise (on-target) single-guide RNA (sgRNA). Therefore, it is essential to predict off-target activity of the designed sgRNAs to avoid unexpected outcomes. The current study is aimed to assess the predictive ability of three machine learning (ML) algorithms (radial basis function (RBF), support vector machine (SVM), and random forest (RF)) alongside the ensemble-bagging (E-B) strategy by synergizing MIT and cutting frequency determination (CFD) scores to predict sgRNA off-target activity through in silico targeting a histone H3-like centromeric protein, HTR12, in cannabis. The RF algorithm exhibited the highest precision, recall, and F-measure compared to all the tested individual algorithms with values of 0.61, 0.64, and 0.62, respectively. We then used the RF algorithm as a meta-classifier for the E-B method, which led to an increased precision with an F-measure of 0.62 and 0.66, respectively. The E-B algorithm had the highest area under the precision recall curves (AUC-PRC; 0.74) and area under the receiver operating characteristic (ROC) curves (AUC-ROC; 0.71), displaying the success of using E-B as one of the common ensemble strategies. This study constitutes a foundational resource of utilizing ML models to predict gRNA off-target activities in cannabis.https://www.mdpi.com/1420-3049/26/7/2053hempmarijuanamachine learning algorithmensemble modelCENH3sgRNA |
spellingShingle | Mohsen Hesami Mohsen Yoosefzadeh Najafabadi Kristian Adamek Davoud Torkamaneh Andrew Maxwell Phineas Jones Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas Molecules hemp marijuana machine learning algorithm ensemble model CENH3 sgRNA |
title | Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas |
title_full | Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas |
title_fullStr | Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas |
title_full_unstemmed | Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas |
title_short | Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas |
title_sort | synergizing off target predictions for in silico insights of cenh3 knockout in cannabis through crispr cas |
topic | hemp marijuana machine learning algorithm ensemble model CENH3 sgRNA |
url | https://www.mdpi.com/1420-3049/26/7/2053 |
work_keys_str_mv | AT mohsenhesami synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas AT mohsenyoosefzadehnajafabadi synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas AT kristianadamek synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas AT davoudtorkamaneh synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas AT andrewmaxwellphineasjones synergizingofftargetpredictionsforinsilicoinsightsofcenh3knockoutincannabisthroughcrisprcas |