CAMPAREE: a robust and configurable RNA expression simulator

Abstract Background The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, exist...

Full description

Bibliographic Details
Main Authors: Nicholas F. Lahens, Thomas G. Brooks, Dimitra Sarantopoulou, Soumyashant Nayak, Cris Lawrence, Antonijo Mrčela, Anand Srinivasan, Jonathan Schug, John B. Hogenesch, Yoseph Barash, Gregory R. Grant
Format: Article
Language:English
Published: BMC 2021-09-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-021-07934-2
_version_ 1818442357548253184
author Nicholas F. Lahens
Thomas G. Brooks
Dimitra Sarantopoulou
Soumyashant Nayak
Cris Lawrence
Antonijo Mrčela
Anand Srinivasan
Jonathan Schug
John B. Hogenesch
Yoseph Barash
Gregory R. Grant
author_facet Nicholas F. Lahens
Thomas G. Brooks
Dimitra Sarantopoulou
Soumyashant Nayak
Cris Lawrence
Antonijo Mrčela
Anand Srinivasan
Jonathan Schug
John B. Hogenesch
Yoseph Barash
Gregory R. Grant
author_sort Nicholas F. Lahens
collection DOAJ
description Abstract Background The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA. Results To fill this need, we developed the Configurable And Modular Program Allowing RNA Expression Emulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE’s use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature. Conclusions Separating input sample modeling from library preparation/sequencing offers added flexibility for both users and developers to mix-and-match different sample and sequencing simulators to suit their specific needs. Furthermore, the ability to maintain sample and sequencing simulators independently provides greater agility to incorporate new biological findings about transcriptomics and new developments in sequencing technologies. Additionally, by simulating at the level of individual molecules, CAMPAREE has the potential to model molecules transcribed from the same genes as a heterogeneous population of transcripts with different states of degradation and processing (splicing, editing, etc.). CAMPAREE was developed in Python, is open source, and freely available at https://github.com/itmat/CAMPAREE .
first_indexed 2024-12-14T18:42:52Z
format Article
id doaj.art-e652ef326a1f48d3a154408788505a58
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-14T18:42:52Z
publishDate 2021-09-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-e652ef326a1f48d3a154408788505a582022-12-21T22:51:27ZengBMCBMC Genomics1471-21642021-09-0122111210.1186/s12864-021-07934-2CAMPAREE: a robust and configurable RNA expression simulatorNicholas F. Lahens0Thomas G. Brooks1Dimitra Sarantopoulou2Soumyashant Nayak3Cris Lawrence4Antonijo Mrčela5Anand Srinivasan6Jonathan Schug7John B. Hogenesch8Yoseph Barash9Gregory R. Grant10The Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of PennsylvaniaThe Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of PennsylvaniaThe Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of PennsylvaniaStatistics and Mathematics Unit, Indian Statistical InstituteThe Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of PennsylvaniaThe Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of PennsylvaniaPerelman School of Medicine, Enterprise Research Applications and High Performance Computing, Penn Medicine Academic Computing Services, University of PennsylvaniaThe Institute for Diabetes, Obesity and Metabolism, The Department of Genetics, Perelman School of Medicine, University of PennsylvaniaDivision of Human Genetics, Department of Pediatrics, Center for Chronobiology, Cincinnati Children’s Hospital Medical CenterThe Department of Genetics, Perelman School of Medicine, University of PennsylvaniaThe Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of PennsylvaniaAbstract Background The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA. Results To fill this need, we developed the Configurable And Modular Program Allowing RNA Expression Emulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE’s use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature. Conclusions Separating input sample modeling from library preparation/sequencing offers added flexibility for both users and developers to mix-and-match different sample and sequencing simulators to suit their specific needs. Furthermore, the ability to maintain sample and sequencing simulators independently provides greater agility to incorporate new biological findings about transcriptomics and new developments in sequencing technologies. Additionally, by simulating at the level of individual molecules, CAMPAREE has the potential to model molecules transcribed from the same genes as a heterogeneous population of transcripts with different states of degradation and processing (splicing, editing, etc.). CAMPAREE was developed in Python, is open source, and freely available at https://github.com/itmat/CAMPAREE .https://doi.org/10.1186/s12864-021-07934-2SimulationBenchmarkingRNA-Seq
spellingShingle Nicholas F. Lahens
Thomas G. Brooks
Dimitra Sarantopoulou
Soumyashant Nayak
Cris Lawrence
Antonijo Mrčela
Anand Srinivasan
Jonathan Schug
John B. Hogenesch
Yoseph Barash
Gregory R. Grant
CAMPAREE: a robust and configurable RNA expression simulator
BMC Genomics
Simulation
Benchmarking
RNA-Seq
title CAMPAREE: a robust and configurable RNA expression simulator
title_full CAMPAREE: a robust and configurable RNA expression simulator
title_fullStr CAMPAREE: a robust and configurable RNA expression simulator
title_full_unstemmed CAMPAREE: a robust and configurable RNA expression simulator
title_short CAMPAREE: a robust and configurable RNA expression simulator
title_sort camparee a robust and configurable rna expression simulator
topic Simulation
Benchmarking
RNA-Seq
url https://doi.org/10.1186/s12864-021-07934-2
work_keys_str_mv AT nicholasflahens campareearobustandconfigurablernaexpressionsimulator
AT thomasgbrooks campareearobustandconfigurablernaexpressionsimulator
AT dimitrasarantopoulou campareearobustandconfigurablernaexpressionsimulator
AT soumyashantnayak campareearobustandconfigurablernaexpressionsimulator
AT crislawrence campareearobustandconfigurablernaexpressionsimulator
AT antonijomrcela campareearobustandconfigurablernaexpressionsimulator
AT anandsrinivasan campareearobustandconfigurablernaexpressionsimulator
AT jonathanschug campareearobustandconfigurablernaexpressionsimulator
AT johnbhogenesch campareearobustandconfigurablernaexpressionsimulator
AT yosephbarash campareearobustandconfigurablernaexpressionsimulator
AT gregoryrgrant campareearobustandconfigurablernaexpressionsimulator