Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection
COVID-19 is a heterogeneous disease caused by SARS-CoV-2. Aside from infections of the lungs, the disease can spread throughout the body and damage many other tissues, leading to multiorgan failure in severe cases. The highly variable symptom severity is influenced by genetic predispositions and pre...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2022-07-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2022.909714/full |
_version_ | 1811292533224898560 |
---|---|
author | Yue Hu Yue Hu Ghalia Rehawi Ghalia Rehawi Lambert Moyon Nathalie Gerstner Nathalie Gerstner Christoph Ogris Janine Knauer-Arloth Janine Knauer-Arloth Florian Bittner Annalisa Marsico Nikola S. Mueller Nikola S. Mueller |
author_facet | Yue Hu Yue Hu Ghalia Rehawi Ghalia Rehawi Lambert Moyon Nathalie Gerstner Nathalie Gerstner Christoph Ogris Janine Knauer-Arloth Janine Knauer-Arloth Florian Bittner Annalisa Marsico Nikola S. Mueller Nikola S. Mueller |
author_sort | Yue Hu |
collection | DOAJ |
description | COVID-19 is a heterogeneous disease caused by SARS-CoV-2. Aside from infections of the lungs, the disease can spread throughout the body and damage many other tissues, leading to multiorgan failure in severe cases. The highly variable symptom severity is influenced by genetic predispositions and preexisting diseases which have not been investigated in a large-scale multimodal manner. We present a holistic analysis framework, setting previously reported COVID-19 genes in context with prepandemic data, such as gene expression patterns across multiple tissues, polygenetic predispositions, and patient diseases, which are putative comorbidities of COVID-19. First, we generate a multimodal network using the prior-based network inference method KiMONo. We then embed the network to generate a meaningful lower-dimensional representation of the data. The input data are obtained via the Genotype-Tissue Expression project (GTEx), containing expression data from a range of tissues with genomic and phenotypic information of over 900 patients and 50 tissues. The generated network consists of nodes, that is, genes and polygenic risk scores (PRS) for several diseases/phenotypes, as well as for COVID-19 severity and hospitalization, and links between them if they are statistically associated in a regularized linear model by feature selection. Applying network embedding on the generated multimodal network allows us to perform efficient network analysis by identifying nodes close by in a lower-dimensional space that correspond to entities which are statistically linked. By determining the similarity between COVID-19 genes and other nodes through embedding, we identify disease associations to tissues, like the brain and gut. We also find strong associations between COVID-19 genes and various diseases such as ischemic heart disease, cerebrovascular disease, and hypertension. Moreover, we find evidence linking PTPN6 to a range of comorbidities along with the genetic predisposition of COVID-19, suggesting that this kinase is a central player in severe cases of COVID-19. In conclusion, our holistic network inference coupled with network embedding of multimodal data enables the contextualization of COVID-19-associated genes with respect to tissues, disease states, and genetic risk factors. Such contextualization can be exploited to further elucidate the biological importance of known and novel genes for severity of the disease in patients. |
first_indexed | 2024-04-13T04:47:53Z |
format | Article |
id | doaj.art-61cf86274b1a4c7a937948675af97a9b |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-04-13T04:47:53Z |
publishDate | 2022-07-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-61cf86274b1a4c7a937948675af97a9b2022-12-22T03:01:48ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-07-011310.3389/fgene.2022.909714909714Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 InfectionYue Hu0Yue Hu1Ghalia Rehawi2Ghalia Rehawi3Lambert Moyon4Nathalie Gerstner5Nathalie Gerstner6Christoph Ogris7Janine Knauer-Arloth8Janine Knauer-Arloth9Florian Bittner10Annalisa Marsico11Nikola S. Mueller12Nikola S. Mueller13Computational Health Department, Helmholtz Center Munich, Neuherberg, GermanyInformatics 12 Chair of Bioinformatics, Technical University Munich, Garching, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyTranslational Research in Psychiatry, MaxPlanck Institute of Psychiatry, Munich, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyTranslational Research in Psychiatry, MaxPlanck Institute of Psychiatry, Munich, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyTranslational Research in Psychiatry, MaxPlanck Institute of Psychiatry, Munich, Germanyknowing01 GmbH, Munich, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, GermanyComputational Health Department, Helmholtz Center Munich, Neuherberg, Germanyknowing01 GmbH, Munich, GermanyCOVID-19 is a heterogeneous disease caused by SARS-CoV-2. Aside from infections of the lungs, the disease can spread throughout the body and damage many other tissues, leading to multiorgan failure in severe cases. The highly variable symptom severity is influenced by genetic predispositions and preexisting diseases which have not been investigated in a large-scale multimodal manner. We present a holistic analysis framework, setting previously reported COVID-19 genes in context with prepandemic data, such as gene expression patterns across multiple tissues, polygenetic predispositions, and patient diseases, which are putative comorbidities of COVID-19. First, we generate a multimodal network using the prior-based network inference method KiMONo. We then embed the network to generate a meaningful lower-dimensional representation of the data. The input data are obtained via the Genotype-Tissue Expression project (GTEx), containing expression data from a range of tissues with genomic and phenotypic information of over 900 patients and 50 tissues. The generated network consists of nodes, that is, genes and polygenic risk scores (PRS) for several diseases/phenotypes, as well as for COVID-19 severity and hospitalization, and links between them if they are statistically associated in a regularized linear model by feature selection. Applying network embedding on the generated multimodal network allows us to perform efficient network analysis by identifying nodes close by in a lower-dimensional space that correspond to entities which are statistically linked. By determining the similarity between COVID-19 genes and other nodes through embedding, we identify disease associations to tissues, like the brain and gut. We also find strong associations between COVID-19 genes and various diseases such as ischemic heart disease, cerebrovascular disease, and hypertension. Moreover, we find evidence linking PTPN6 to a range of comorbidities along with the genetic predisposition of COVID-19, suggesting that this kinase is a central player in severe cases of COVID-19. In conclusion, our holistic network inference coupled with network embedding of multimodal data enables the contextualization of COVID-19-associated genes with respect to tissues, disease states, and genetic risk factors. Such contextualization can be exploited to further elucidate the biological importance of known and novel genes for severity of the disease in patients.https://www.frontiersin.org/articles/10.3389/fgene.2022.909714/fullmulti-omic integrationnetwork inferencenetwork embeddingCOVID-19machine learningpolygenic risk score (PRS) |
spellingShingle | Yue Hu Yue Hu Ghalia Rehawi Ghalia Rehawi Lambert Moyon Nathalie Gerstner Nathalie Gerstner Christoph Ogris Janine Knauer-Arloth Janine Knauer-Arloth Florian Bittner Annalisa Marsico Nikola S. Mueller Nikola S. Mueller Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection Frontiers in Genetics multi-omic integration network inference network embedding COVID-19 machine learning polygenic risk score (PRS) |
title | Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection |
title_full | Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection |
title_fullStr | Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection |
title_full_unstemmed | Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection |
title_short | Network Embedding Across Multiple Tissues and Data Modalities Elucidates the Context of Host Factors Important for COVID-19 Infection |
title_sort | network embedding across multiple tissues and data modalities elucidates the context of host factors important for covid 19 infection |
topic | multi-omic integration network inference network embedding COVID-19 machine learning polygenic risk score (PRS) |
url | https://www.frontiersin.org/articles/10.3389/fgene.2022.909714/full |
work_keys_str_mv | AT yuehu networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT yuehu networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT ghaliarehawi networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT ghaliarehawi networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT lambertmoyon networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT nathaliegerstner networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT nathaliegerstner networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT christophogris networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT janineknauerarloth networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT janineknauerarloth networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT florianbittner networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT annalisamarsico networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT nikolasmueller networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection AT nikolasmueller networkembeddingacrossmultipletissuesanddatamodalitieselucidatesthecontextofhostfactorsimportantforcovid19infection |