Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data a...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-01-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2022.1096797/full |
_version_ | 1828070469689933824 |
---|---|
author | Shenjie Wang Shenjie Wang Yuqian Liu Yuqian Liu Juan Wang Juan Wang Juan Wang Xiaoyan Zhu Xiaoyan Zhu Yuzhi Shi Xuwen Wang Xuwen Wang Tao Liu Xiao Xiao Xiao Xiao Jiayin Wang Jiayin Wang |
author_facet | Shenjie Wang Shenjie Wang Yuqian Liu Yuqian Liu Juan Wang Juan Wang Juan Wang Xiaoyan Zhu Xiaoyan Zhu Yuzhi Shi Xuwen Wang Xuwen Wang Tao Liu Xiao Xiao Xiao Xiao Jiayin Wang Jiayin Wang |
author_sort | Shenjie Wang |
collection | DOAJ |
description | A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only. |
first_indexed | 2024-04-11T00:38:18Z |
format | Article |
id | doaj.art-9a9d8165e3034dcf97cbadccf9a19d6c |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-04-11T00:38:18Z |
publishDate | 2023-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-9a9d8165e3034dcf97cbadccf9a19d6c2023-01-06T14:32:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212023-01-011310.3389/fgene.2022.10967971096797Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data featuresShenjie Wang0Shenjie Wang1Yuqian Liu2Yuqian Liu3Juan Wang4Juan Wang5Juan Wang6Xiaoyan Zhu7Xiaoyan Zhu8Yuzhi Shi9Xuwen Wang10Xuwen Wang11Tao Liu12Xiao Xiao13Xiao Xiao14Jiayin Wang15Jiayin Wang16School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaAnnoroad Gene Technology (Beijing) Co. Ltd, Beijing, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaAnnoroad Gene Technology (Beijing) Co. Ltd, Beijing, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaAnnoroad Gene Technology (Beijing) Co. Ltd, Beijing, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaGeneplus Shenzhen, Shenzhen, ChinaSchool of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, ChinaShaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaA lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only.https://www.frontiersin.org/articles/10.3389/fgene.2022.1096797/fullsequencing data analysisbioinformatics toolsoftware recommendationstructural variant callermeta-learning framework |
spellingShingle | Shenjie Wang Shenjie Wang Yuqian Liu Yuqian Liu Juan Wang Juan Wang Juan Wang Xiaoyan Zhu Xiaoyan Zhu Yuzhi Shi Xuwen Wang Xuwen Wang Tao Liu Xiao Xiao Xiao Xiao Jiayin Wang Jiayin Wang Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features Frontiers in Genetics sequencing data analysis bioinformatics tool software recommendation structural variant caller meta-learning framework |
title | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_full | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_fullStr | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_full_unstemmed | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_short | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_sort | is an sv caller compatible with sequencing data an online recommendation tool to automatically recommend the optimal caller based on data features |
topic | sequencing data analysis bioinformatics tool software recommendation structural variant caller meta-learning framework |
url | https://www.frontiersin.org/articles/10.3389/fgene.2022.1096797/full |
work_keys_str_mv | AT shenjiewang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT shenjiewang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT yuqianliu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT yuqianliu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT juanwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT juanwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT juanwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT xiaoyanzhu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT xiaoyanzhu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT yuzhishi isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT xuwenwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT xuwenwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT taoliu isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT xiaoxiao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT xiaoxiao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT jiayinwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT jiayinwang isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures |