Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection

COVID-19 is a heterogeneous disease caused by SARS-CoV-2. Aside from infections of the lungs, the disease can spread throughout the body and damage many other tissues, leading to multiorgan failure in severe cases. The highly variable symptom severity is influenced by genetic predispositions and pre...

Full description

Bibliographic Details
Main Authors: Yue Hu, Ghalia Rehawi, Lambert Moyon, Nathalie Gerstner, Christoph Ogris, Janine Knauer-Arloth, Florian Bittner, Annalisa Marsico, Nikola S. Mueller
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.909714/full
_version_ 1811292533224898560
author Yue Hu
Yue Hu
Ghalia Rehawi
Ghalia Rehawi
Lambert Moyon
Nathalie Gerstner
Nathalie Gerstner
Christoph Ogris
Janine Knauer-Arloth
Janine Knauer-Arloth
Florian Bittner
Annalisa Marsico
Nikola S. Mueller
Nikola S. Mueller
author_facet Yue Hu
Yue Hu
Ghalia Rehawi
Ghalia Rehawi
Lambert Moyon
Nathalie Gerstner
Nathalie Gerstner
Christoph Ogris
Janine Knauer-Arloth
Janine Knauer-Arloth
Florian Bittner
Annalisa Marsico
Nikola S. Mueller
Nikola S. Mueller
author_sort Yue Hu
collection DOAJ
description COVID-19 is a heterogeneous disease caused by SARS-CoV-2. Aside from infections of the lungs, the disease can spread throughout the body and damage many other tissues, leading to multiorgan failure in severe cases. The highly variable symptom severity is influenced by genetic predispositions and preexisting diseases which have not been investigated in a large-scale multimodal manner. We present a holistic analysis framework, setting previously reported COVID-19 genes in context with prepandemic data, such as gene expression patterns across multiple tissues, polygenetic predispositions, and patient diseases, which are putative comorbidities of COVID-19. First, we generate a multimodal network using the prior-based network inference method KiMONo. We then embed the network to generate a meaningful lower-dimensional representation of the data. The input data are obtained via the Genotype-Tissue Expression project (GTEx), containing expression data from a range of tissues with genomic and phenotypic information of over 900 patients and 50 tissues. The generated network consists of nodes, that is, genes and polygenic risk scores (PRS) for several diseases/phenotypes, as well as for COVID-19 severity and hospitalization, and links between them if they are statistically associated in a regularized linear model by feature selection. Applying network embedding on the generated multimodal network allows us to perform efficient network analysis by identifying nodes close by in a lower-dimensional space that correspond to entities which are statistically linked. By determining the similarity between COVID-19 genes and other nodes through embedding, we identify disease associations to tissues, like the brain and gut. We also find strong associations between COVID-19 genes and various diseases such as ischemic heart disease, cerebrovascular disease, and hypertension. Moreover, we find evidence linking PTPN6 to a range of comorbidities along with the genetic predisposition of COVID-19, suggesting that this kinase is a central player in severe cases of COVID-19. In conclusion, our holistic network inference coupled with network embedding of multimodal data enables the contextualization of COVID-19-associated genes with respect to tissues, disease states, and genetic risk factors. Such contextualization can be exploited to further elucidate the biological importance of known and novel genes for severity of the disease in patients.
first_indexed 2024-04-13T04:47:53Z
format Article
id doaj.art-61cf86274b1a4c7a937948675af97a9b
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-13T04:47:53Z
publishDate 2022-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-61cf86274b1a4c7a937948675af97a9b2022-12-22T03:01:48ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-07-011310.3389/fgene.2022.909714909714Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 InfectionYue Hu0Yue Hu1Ghalia Rehawi2Ghalia Rehawi3Lambert Moyon4Nathalie Gerstner5Nathalie Gerstner6Christoph Ogris7Janine Knauer-Arloth8Janine Knauer-Arloth9Florian Bittner10Annalisa Marsico11Nikola S. Mueller12Nikola S. Mueller13Computational Health Department, Helmholtz Center Munich, Neuherberg, GermanyInformatics 12 Chair of Bioinformatics, Technical University Munich, Garching, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyTranslational Research in Psychiatry, MaxPlanck Institute of Psychiatry, Munich, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyTranslational Research in Psychiatry, MaxPlanck Institute of Psychiatry, Munich, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyTranslational Research in Psychiatry, MaxPlanck Institute of Psychiatry, Munich, Germanyknowing01 GmbH, Munich, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, Germanyknowing01 GmbH, Munich, GermanyCOVID-19 is a heterogeneous disease caused by SARS-CoV-2. Aside from infections of the lungs, the disease can spread throughout the body and damage many other tissues, leading to multiorgan failure in severe cases. The highly variable symptom severity is influenced by genetic predispositions and preexisting diseases which have not been investigated in a large-scale multimodal manner. We present a holistic analysis framework, setting previously reported COVID-19 genes in context with prepandemic data, such as gene expression patterns across multiple tissues, polygenetic predispositions, and patient diseases, which are putative comorbidities of COVID-19. First, we generate a multimodal network using the prior-based network inference method KiMONo. We then embed the network to generate a meaningful lower-dimensional representation of the data. The input data are obtained via the Genotype-Tissue Expression project (GTEx), containing expression data from a range of tissues with genomic and phenotypic information of over 900 patients and 50 tissues. The generated network consists of nodes, that is, genes and polygenic risk scores (PRS) for several diseases/phenotypes, as well as for COVID-19 severity and hospitalization, and links between them if they are statistically associated in a regularized linear model by feature selection. Applying network embedding on the generated multimodal network allows us to perform efficient network analysis by identifying nodes close by in a lower-dimensional space that correspond to entities which are statistically linked. By determining the similarity between COVID-19 genes and other nodes through embedding, we identify disease associations to tissues, like the brain and gut. We also find strong associations between COVID-19 genes and various diseases such as ischemic heart disease, cerebrovascular disease, and hypertension. Moreover, we find evidence linking PTPN6 to a range of comorbidities along with the genetic predisposition of COVID-19, suggesting that this kinase is a central player in severe cases of COVID-19. In conclusion, our holistic network inference coupled with network embedding of multimodal data enables the contextualization of COVID-19-associated genes with respect to tissues, disease states, and genetic risk factors. Such contextualization can be exploited to further elucidate the biological importance of known and novel genes for severity of the disease in patients.https://www.frontiersin.org/articles/10.3389/fgene.2022.909714/fullmulti-omic integrationnetwork inferencenetwork embeddingCOVID-19machine learningpolygenic risk score (PRS)
spellingShingle Yue Hu
Yue Hu
Ghalia Rehawi
Ghalia Rehawi
Lambert Moyon
Nathalie Gerstner
Nathalie Gerstner
Christoph Ogris
Janine Knauer-Arloth
Janine Knauer-Arloth
Florian Bittner
Annalisa Marsico
Nikola S. Mueller
Nikola S. Mueller
Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection
Frontiers in Genetics
multi-omic integration
network inference
network embedding
COVID-19
machine learning
polygenic risk score (PRS)
title Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection
title_full Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection
title_fullStr Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection
title_full_unstemmed Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection
title_short Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection
title_sort network embedding across multiple tissues and data modalities elucidates the context of host factors important for covid 19 infection
topic multi-omic integration
network inference
network embedding
COVID-19
machine learning
polygenic risk score (PRS)
url https://www.frontiersin.org/articles/10.3389/fgene.2022.909714/full
work_keys_str_mv AT yuehu networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT yuehu networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT ghaliarehawi networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT ghaliarehawi networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT lambertmoyon networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT nathaliegerstner networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT nathaliegerstner networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT christophogris networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT janineknauerarloth networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT janineknauerarloth networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT florianbittner networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT annalisamarsico networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT nikolasmueller networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection
AT nikolasmueller networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection