Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders

Abstract Background Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect...

Full description

Bibliographic Details
Main Authors: Quentin Ferré, Jeanne Chèneby, Denis Puthier, Cécile Capponi, Benoît Ballester
Format: Article
Language:English
Published: BMC 2021-09-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04359-2
_version_ 1819098756147052544
author Quentin Ferré
Jeanne Chèneby
Denis Puthier
Cécile Capponi
Benoît Ballester
author_facet Quentin Ferré
Jeanne Chèneby
Denis Puthier
Cécile Capponi
Benoît Ballester
author_sort Quentin Ferré
collection DOAJ
description Abstract Background Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision. Results Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions’ representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database’s large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models. Conclusion Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak.
first_indexed 2024-12-22T00:36:02Z
format Article
id doaj.art-177889c5f407484fb6fe69b6f268bdec
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-22T00:36:02Z
publishDate 2021-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-177889c5f407484fb6fe69b6f268bdec2022-12-21T18:44:49ZengBMCBMC Bioinformatics1471-21052021-09-0122112610.1186/s12859-021-04359-2Anomaly detection in genomic catalogues using unsupervised multi-view autoencodersQuentin Ferré0Jeanne Chèneby1Denis Puthier2Cécile Capponi3Benoît Ballester4INSERM, TAGC, Aix Marseille UniversityINSERM, TAGC, Aix Marseille UniversityINSERM, TAGC, Aix Marseille UniversityUniversité de Toulon, CNRS, LIS, Aix Marseille UniversityINSERM, TAGC, Aix Marseille UniversityAbstract Background Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision. Results Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions’ representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database’s large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models. Conclusion Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak.https://doi.org/10.1186/s12859-021-04359-2Genomic assayAnomaly detectionCis regulatory elementUnsupervised curationConvolutional autoencoderChIP-seq peak quality
spellingShingle Quentin Ferré
Jeanne Chèneby
Denis Puthier
Cécile Capponi
Benoît Ballester
Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders
BMC Bioinformatics
Genomic assay
Anomaly detection
Cis regulatory element
Unsupervised curation
Convolutional autoencoder
ChIP-seq peak quality
title Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders
title_full Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders
title_fullStr Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders
title_full_unstemmed Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders
title_short Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders
title_sort anomaly detection in genomic catalogues using unsupervised multi view autoencoders
topic Genomic assay
Anomaly detection
Cis regulatory element
Unsupervised curation
Convolutional autoencoder
ChIP-seq peak quality
url https://doi.org/10.1186/s12859-021-04359-2
work_keys_str_mv AT quentinferre anomalydetectioningenomiccataloguesusingunsupervisedmultiviewautoencoders
AT jeannecheneby anomalydetectioningenomiccataloguesusingunsupervisedmultiviewautoencoders
AT denisputhier anomalydetectioningenomiccataloguesusingunsupervisedmultiviewautoencoders
AT cecilecapponi anomalydetectioningenomiccataloguesusingunsupervisedmultiviewautoencoders
AT benoitballester anomalydetectioningenomiccataloguesusingunsupervisedmultiviewautoencoders