Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families

ABSTRACT Repetitive elements cause assembly fragmentation in complex eukaryotic genomes, limiting the study of their variability. The genome of Trypanosoma cruzi, the parasite that causes Chagas disease, has a high repetitive content, including multigene families. Although many T. cruzi multigene fa...

Full description

Bibliographic Details
Main Authors: João Luís Reis-Cunha, Anderson Coqueiro-dos-Santos, Samuel Alexandre Pimenta-Carvalho, Larissa Pinheiro Marques, Gabriela F. Rodrigues-Luiz, Rodrigo P. Baptista, Laila Viana de Almeida, Nathan Ravi Medeiros Honorato, Francisco Pereira Lobo, Vanessa Gomes Fraga, Lucia Maria da Cunha Galvão, Lilian Lacerda Bueno, Ricardo Toshio Fujiwara, Mariana Santos Cardoso, Gustavo Coutinho Cerqueira, Daniella C. Bartholomeu
Format: Article
Language:English
Published: American Society for Microbiology 2022-12-01
Series:mBio
Subjects:
Online Access:https://journals.asm.org/doi/10.1128/mbio.02319-22
_version_ 1797979889440129024
author João Luís Reis-Cunha
Anderson Coqueiro-dos-Santos
Samuel Alexandre Pimenta-Carvalho
Larissa Pinheiro Marques
Gabriela F. Rodrigues-Luiz
Rodrigo P. Baptista
Laila Viana de Almeida
Nathan Ravi Medeiros Honorato
Francisco Pereira Lobo
Vanessa Gomes Fraga
Lucia Maria da Cunha Galvão
Lilian Lacerda Bueno
Ricardo Toshio Fujiwara
Mariana Santos Cardoso
Gustavo Coutinho Cerqueira
Daniella C. Bartholomeu
author_facet João Luís Reis-Cunha
Anderson Coqueiro-dos-Santos
Samuel Alexandre Pimenta-Carvalho
Larissa Pinheiro Marques
Gabriela F. Rodrigues-Luiz
Rodrigo P. Baptista
Laila Viana de Almeida
Nathan Ravi Medeiros Honorato
Francisco Pereira Lobo
Vanessa Gomes Fraga
Lucia Maria da Cunha Galvão
Lilian Lacerda Bueno
Ricardo Toshio Fujiwara
Mariana Santos Cardoso
Gustavo Coutinho Cerqueira
Daniella C. Bartholomeu
author_sort João Luís Reis-Cunha
collection DOAJ
description ABSTRACT Repetitive elements cause assembly fragmentation in complex eukaryotic genomes, limiting the study of their variability. The genome of Trypanosoma cruzi, the parasite that causes Chagas disease, has a high repetitive content, including multigene families. Although many T. cruzi multigene families encode surface proteins that play pivotal roles in host-parasite interactions, their variability is currently underestimated, as their high repetitive content results in collapsed gene variants. To estimate sequence variability and copy number variation of multigene families, we developed a read-based approach that is independent of gene-specific read mapping and de novo assembly. This methodology was used to estimate the copy number and variability of MASP, TcMUC, and Trans-Sialidase (TS), the three largest T. cruzi multigene families, in 36 strains, including members of all six parasite discrete typing units (DTUs). We found that these three families present a specific pattern of variability and copy number among the distinct parasite DTUs. Inter-DTU hybrid strains presented a higher variability of these families, suggesting that maintaining a larger content of their members could be advantageous. In addition, in a chronic murine model and chronic Chagasic human patients, the immune response was focused on TS antigens, suggesting that targeting TS conserved sequences could be a potential avenue to improve diagnosis and vaccine design against Chagas disease. Finally, the proposed approach can be applied to study multicopy genes in any organism, opening new avenues to access sequence variability in complex genomes. IMPORTANCE Sequences that have several copies in a genome, such as multicopy-gene families, mobile elements, and microsatellites, are among the most challenging genomic segments to study. They are frequently underestimated in genome assemblies, hampering the correct assessment of these important players in genome evolution and adaptation. Here, we developed a new methodology to estimate variability and copy numbers of repetitive genomic regions and employed it to characterize the T. cruzi multigene families MASP, TcMUC, and transsialidase (TS), which are important virulence factors in this parasite. We showed that multigene families vary in sequence and content among the parasite’s lineages, whereas hybrid strains have a higher sequence variability that could be advantageous to the parasite's survivability. By identifying conserved sequences within multigene families, we showed that the mammalian host immune response toward these multigene families is usually focused on the TS multigene family. These TS conserved and immunogenic peptides can be explored in future works as diagnostic targets or vaccine candidates for Chagas disease. Finally, this methodology can be easily applied to any organism of interest, which will aid in our understanding of complex genomic regions.
first_indexed 2024-04-11T05:46:02Z
format Article
id doaj.art-ba5075b9a21e493ead3788e470e0b902
institution Directory Open Access Journal
issn 2150-7511
language English
last_indexed 2024-04-11T05:46:02Z
publishDate 2022-12-01
publisher American Society for Microbiology
record_format Article
series mBio
spelling doaj.art-ba5075b9a21e493ead3788e470e0b9022022-12-22T04:42:14ZengAmerican Society for MicrobiologymBio2150-75112022-12-0113610.1128/mbio.02319-22Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene FamiliesJoão Luís Reis-Cunha0Anderson Coqueiro-dos-Santos1Samuel Alexandre Pimenta-Carvalho2Larissa Pinheiro Marques3Gabriela F. Rodrigues-Luiz4Rodrigo P. Baptista5Laila Viana de Almeida6Nathan Ravi Medeiros Honorato7Francisco Pereira Lobo8Vanessa Gomes Fraga9Lucia Maria da Cunha Galvão10Lilian Lacerda Bueno11Ricardo Toshio Fujiwara12Mariana Santos Cardoso13Gustavo Coutinho Cerqueira14Daniella C. Bartholomeu15Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilExperimental Medicine Research Cluster (EMRC), University of Campinas (UNICAMP), Campinas, São Paulo, BrazilCenter for Tropical and Emerging Global Diseases and Institute of Bioinformatics, The University of Georgia, Athens, Georgia, USADepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Genética e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilPersonal Genome Diagnostics, Baltimore, Maryland, USADepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, BrazilABSTRACT Repetitive elements cause assembly fragmentation in complex eukaryotic genomes, limiting the study of their variability. The genome of Trypanosoma cruzi, the parasite that causes Chagas disease, has a high repetitive content, including multigene families. Although many T. cruzi multigene families encode surface proteins that play pivotal roles in host-parasite interactions, their variability is currently underestimated, as their high repetitive content results in collapsed gene variants. To estimate sequence variability and copy number variation of multigene families, we developed a read-based approach that is independent of gene-specific read mapping and de novo assembly. This methodology was used to estimate the copy number and variability of MASP, TcMUC, and Trans-Sialidase (TS), the three largest T. cruzi multigene families, in 36 strains, including members of all six parasite discrete typing units (DTUs). We found that these three families present a specific pattern of variability and copy number among the distinct parasite DTUs. Inter-DTU hybrid strains presented a higher variability of these families, suggesting that maintaining a larger content of their members could be advantageous. In addition, in a chronic murine model and chronic Chagasic human patients, the immune response was focused on TS antigens, suggesting that targeting TS conserved sequences could be a potential avenue to improve diagnosis and vaccine design against Chagas disease. Finally, the proposed approach can be applied to study multicopy genes in any organism, opening new avenues to access sequence variability in complex genomes. IMPORTANCE Sequences that have several copies in a genome, such as multicopy-gene families, mobile elements, and microsatellites, are among the most challenging genomic segments to study. They are frequently underestimated in genome assemblies, hampering the correct assessment of these important players in genome evolution and adaptation. Here, we developed a new methodology to estimate variability and copy numbers of repetitive genomic regions and employed it to characterize the T. cruzi multigene families MASP, TcMUC, and transsialidase (TS), which are important virulence factors in this parasite. We showed that multigene families vary in sequence and content among the parasite’s lineages, whereas hybrid strains have a higher sequence variability that could be advantageous to the parasite's survivability. By identifying conserved sequences within multigene families, we showed that the mammalian host immune response toward these multigene families is usually focused on the TS multigene family. These TS conserved and immunogenic peptides can be explored in future works as diagnostic targets or vaccine candidates for Chagas disease. Finally, this methodology can be easily applied to any organism of interest, which will aid in our understanding of complex genomic regions.https://journals.asm.org/doi/10.1128/mbio.02319-22multicopy genesvariabilitycopy number variationcomplex genomesT. cruziMASP
spellingShingle João Luís Reis-Cunha
Anderson Coqueiro-dos-Santos
Samuel Alexandre Pimenta-Carvalho
Larissa Pinheiro Marques
Gabriela F. Rodrigues-Luiz
Rodrigo P. Baptista
Laila Viana de Almeida
Nathan Ravi Medeiros Honorato
Francisco Pereira Lobo
Vanessa Gomes Fraga
Lucia Maria da Cunha Galvão
Lilian Lacerda Bueno
Ricardo Toshio Fujiwara
Mariana Santos Cardoso
Gustavo Coutinho Cerqueira
Daniella C. Bartholomeu
Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families
mBio
multicopy genes
variability
copy number variation
complex genomes
T. cruzi
MASP
title Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families
title_full Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families
title_fullStr Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families
title_full_unstemmed Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families
title_short Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families
title_sort accessing the variability of multicopy genes in complex genomes using unassembled next generation sequencing reads the case of trypanosoma cruzi multigene families
topic multicopy genes
variability
copy number variation
complex genomes
T. cruzi
MASP
url https://journals.asm.org/doi/10.1128/mbio.02319-22
work_keys_str_mv AT joaoluisreiscunha accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT andersoncoqueirodossantos accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT samuelalexandrepimentacarvalho accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT larissapinheiromarques accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT gabrielafrodriguesluiz accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT rodrigopbaptista accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT lailavianadealmeida accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT nathanravimedeiroshonorato accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT franciscopereiralobo accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT vanessagomesfraga accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT luciamariadacunhagalvao accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT lilianlacerdabueno accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT ricardotoshiofujiwara accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT marianasantoscardoso accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT gustavocoutinhocerqueira accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies
AT daniellacbartholomeu accessingthevariabilityofmulticopygenesincomplexgenomesusingunassemblednextgenerationsequencingreadsthecaseoftrypanosomacruzimultigenefamilies