Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features

A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data a...

Full description

Bibliographic Details
Main Authors: Shenjie Wang, Yuqian Liu, Juan Wang, Xiaoyan Zhu, Yuzhi Shi, Xuwen Wang, Tao Liu, Xiao Xiao, Jiayin Wang
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-01-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.1096797/full
_version_ 1828070469689933824
author Shenjie Wang
Shenjie Wang
Yuqian Liu
Yuqian Liu
Juan Wang
Juan Wang
Juan Wang
Xiaoyan Zhu
Xiaoyan Zhu
Yuzhi Shi
Xuwen Wang
Xuwen Wang
Tao Liu
Xiao Xiao
Xiao Xiao
Jiayin Wang
Jiayin Wang
author_facet Shenjie Wang
Shenjie Wang
Yuqian Liu
Yuqian Liu
Juan Wang
Juan Wang
Juan Wang
Xiaoyan Zhu
Xiaoyan Zhu
Yuzhi Shi
Xuwen Wang
Xuwen Wang
Tao Liu
Xiao Xiao
Xiao Xiao
Jiayin Wang
Jiayin Wang
author_sort Shenjie Wang
collection DOAJ
description A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only.
first_indexed 2024-04-11T00:38:18Z
format Article
id doaj.art-9a9d8165e3034dcf97cbadccf9a19d6c
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-11T00:38:18Z
publishDate 2023-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-9a9d8165e3034dcf97cbadccf9a19d6c2023-01-06T14:32:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212023-01-011310.3389/fgene.2022.10967971096797Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data featuresShenjie Wang0Shenjie Wang1Yuqian Liu2Yuqian Liu3Juan Wang4Juan Wang5Juan Wang6Xiaoyan Zhu7Xiaoyan Zhu8Yuzhi Shi9Xuwen Wang10Xuwen Wang11Tao Liu12Xiao Xiao13Xiao Xiao14Jiayin Wang15Jiayin Wang16School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaAnnoroad Gene Technology (Beijing) Co. Ltd, Beijing, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaAnnoroad Gene Technology (Beijing) Co. Ltd, Beijing, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaAnnoroad Gene Technology (Beijing) Co. Ltd, Beijing, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaGeneplus Shenzhen, Shenzhen, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaA lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only.https://www.frontiersin.org/articles/10.3389/fgene.2022.1096797/fullsequencing data analysisbioinformatics toolsoftware recommendationstructural variant callermeta-learning framework
spellingShingle Shenjie Wang
Shenjie Wang
Yuqian Liu
Yuqian Liu
Juan Wang
Juan Wang
Juan Wang
Xiaoyan Zhu
Xiaoyan Zhu
Yuzhi Shi
Xuwen Wang
Xuwen Wang
Tao Liu
Xiao Xiao
Xiao Xiao
Jiayin Wang
Jiayin Wang
Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
Frontiers in Genetics
sequencing data analysis
bioinformatics tool
software recommendation
structural variant caller
meta-learning framework
title Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_full Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_fullStr Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_full_unstemmed Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_short Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_sort is an sv caller compatible with sequencing data an online recommendation tool to automatically recommend the optimal caller based on data features
topic sequencing data analysis
bioinformatics tool
software recommendation
structural variant caller
meta-learning framework
url https://www.frontiersin.org/articles/10.3389/fgene.2022.1096797/full
work_keys_str_mv AT shenjiewang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT shenjiewang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT yuqianliu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT yuqianliu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT juanwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT juanwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT juanwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT xiaoyanzhu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT xiaoyanzhu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT yuzhishi isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT xuwenwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT xuwenwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT taoliu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT xiaoxiao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT xiaoxiao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT jiayinwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT jiayinwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures