Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals

Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to a...

Full description

Bibliographic Details
Main Authors:	Pavel Latyshev, Fedor Pavlov, Alan Herbert, Maria Poptsova
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-03-01
Series:	Frontiers in Big Data
Subjects:	transfer learning domain adaptation domain adversarial networks versatile domain adaptation Minimum Class Confusion histone marks
Online Access:	https://www.frontiersin.org/articles/10.3389/fdata.2023.1140663/full

_version_	1797856289053736960
author	Pavel Latyshev Fedor Pavlov Alan Herbert Alan Herbert Maria Poptsova
author_facet	Pavel Latyshev Fedor Pavlov Alan Herbert Alan Herbert Maria Poptsova
author_sort	Pavel Latyshev
collection	DOAJ
description	Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias.
first_indexed	2024-04-09T20:38:02Z
format	Article
id	doaj.art-81d011e5fcbc48e9a7972c01433b928f
institution	Directory Open Access Journal
issn	2624-909X
language	English
last_indexed	2024-04-09T20:38:02Z
publishDate	2023-03-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Big Data
spelling	doaj.art-81d011e5fcbc48e9a7972c01433b928f2023-03-30T06:28:29ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2023-03-01610.3389/fdata.2023.11406631140663Unsupervised domain adaptation methods for cross-species transfer of regulatory code signalsPavel Latyshev0Fedor Pavlov1Alan Herbert2Alan Herbert3Maria Poptsova4Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, RussiaLaboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, RussiaLaboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, RussiaInsideOutBio, Charlestown, MA, United StatesLaboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, RussiaDue to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias.https://www.frontiersin.org/articles/10.3389/fdata.2023.1140663/fulltransfer learningdomain adaptationdomain adversarial networksversatile domain adaptationMinimum Class Confusionhistone marks
spellingShingle	Pavel Latyshev Fedor Pavlov Alan Herbert Alan Herbert Maria Poptsova Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals Frontiers in Big Data transfer learning domain adaptation domain adversarial networks versatile domain adaptation Minimum Class Confusion histone marks
title	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_full	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_fullStr	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_full_unstemmed	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_short	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_sort	unsupervised domain adaptation methods for cross species transfer of regulatory code signals
topic	transfer learning domain adaptation domain adversarial networks versatile domain adaptation Minimum Class Confusion histone marks
url	https://www.frontiersin.org/articles/10.3389/fdata.2023.1140663/full
work_keys_str_mv	AT pavellatyshev unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals AT fedorpavlov unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals AT alanherbert unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals AT alanherbert unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals AT mariapoptsova unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals

Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals

Similar Items