Using multiple reference genomes to identify and resolve annotation inconsistencies

Abstract Background Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure...

Full description

Bibliographic Details
Main Authors: Patrick J. Monnahan, Jean-Michel Michno, Christine O’Connor, Alex B. Brohammer, Nathan M. Springer, Suzanne E. McGaugh, Candice N. Hirsch
Format: Article
Language:English
Published: BMC 2020-04-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-020-6696-8
_version_ 1819208460363890688
author Patrick J. Monnahan
Jean-Michel Michno
Christine O’Connor
Alex B. Brohammer
Nathan M. Springer
Suzanne E. McGaugh
Candice N. Hirsch
author_facet Patrick J. Monnahan
Jean-Michel Michno
Christine O’Connor
Alex B. Brohammer
Nathan M. Springer
Suzanne E. McGaugh
Candice N. Hirsch
author_sort Patrick J. Monnahan
collection DOAJ
description Abstract Background Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure of gene models within these regions can differ. Of particular concern are split-gene misannotations, in which a single gene is incorrectly annotated as two distinct genes or two genes are incorrectly annotated as a single gene. These misannotations can have major impacts on functional prediction, estimates of expression, and many downstream analyses. Results We developed a high-throughput method based on pairwise comparisons of annotations that detect potential split-gene misannotations and quantifies support for whether the genes should be merged into a single gene model. We demonstrated the utility of our method using gene annotations of three reference genomes from maize (B73, PH207, and W22), a difficult system from an annotation perspective due to the size and complexity of the genome. On average, we found several hundred of these potential split-gene misannotations in each pairwise comparison, corresponding to 3–5% of gene models across annotations. To determine which state (i.e. one gene or multiple genes) is biologically supported, we utilized RNAseq data from 10 tissues throughout development along with a novel metric and simulation framework. The methods we have developed require minimal human interaction and can be applied to future assemblies to aid in annotation efforts. Conclusions Split-gene misannotations occur at appreciable frequency in maize annotations. We have developed a method to easily identify and correct these misannotations. Importantly, this method is generic in that it can utilize any type of short-read expression data. Failure to account for split-gene misannotations has serious consequences for biological inference, particularly for expression-based analyses.
first_indexed 2024-12-23T05:39:44Z
format Article
id doaj.art-b4690602578a4b4f8511ec86c218a3b3
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-23T05:39:44Z
publishDate 2020-04-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-b4690602578a4b4f8511ec86c218a3b32022-12-21T17:58:14ZengBMCBMC Genomics1471-21642020-04-0121111310.1186/s12864-020-6696-8Using multiple reference genomes to identify and resolve annotation inconsistenciesPatrick J. Monnahan0Jean-Michel Michno1Christine O’Connor2Alex B. Brohammer3Nathan M. Springer4Suzanne E. McGaugh5Candice N. Hirsch6Department of Agronomy and Plant Genetics, University of MinnesotaDepartment of Agronomy and Plant Genetics, University of MinnesotaDepartment of Agronomy and Plant Genetics, University of MinnesotaDepartment of Agronomy and Plant Genetics, University of MinnesotaDepartment of Plant and Microbial Biology, University of MinnesotaDepartment of Ecology, Evolution, and Behavior, University of MinnesotaDepartment of Agronomy and Plant Genetics, University of MinnesotaAbstract Background Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure of gene models within these regions can differ. Of particular concern are split-gene misannotations, in which a single gene is incorrectly annotated as two distinct genes or two genes are incorrectly annotated as a single gene. These misannotations can have major impacts on functional prediction, estimates of expression, and many downstream analyses. Results We developed a high-throughput method based on pairwise comparisons of annotations that detect potential split-gene misannotations and quantifies support for whether the genes should be merged into a single gene model. We demonstrated the utility of our method using gene annotations of three reference genomes from maize (B73, PH207, and W22), a difficult system from an annotation perspective due to the size and complexity of the genome. On average, we found several hundred of these potential split-gene misannotations in each pairwise comparison, corresponding to 3–5% of gene models across annotations. To determine which state (i.e. one gene or multiple genes) is biologically supported, we utilized RNAseq data from 10 tissues throughout development along with a novel metric and simulation framework. The methods we have developed require minimal human interaction and can be applied to future assemblies to aid in annotation efforts. Conclusions Split-gene misannotations occur at appreciable frequency in maize annotations. We have developed a method to easily identify and correct these misannotations. Importantly, this method is generic in that it can utilize any type of short-read expression data. Failure to account for split-gene misannotations has serious consequences for biological inference, particularly for expression-based analyses.http://link.springer.com/article/10.1186/s12864-020-6696-8AnnotationGenome assemblyMaizeSplit-gene
spellingShingle Patrick J. Monnahan
Jean-Michel Michno
Christine O’Connor
Alex B. Brohammer
Nathan M. Springer
Suzanne E. McGaugh
Candice N. Hirsch
Using multiple reference genomes to identify and resolve annotation inconsistencies
BMC Genomics
Annotation
Genome assembly
Maize
Split-gene
title Using multiple reference genomes to identify and resolve annotation inconsistencies
title_full Using multiple reference genomes to identify and resolve annotation inconsistencies
title_fullStr Using multiple reference genomes to identify and resolve annotation inconsistencies
title_full_unstemmed Using multiple reference genomes to identify and resolve annotation inconsistencies
title_short Using multiple reference genomes to identify and resolve annotation inconsistencies
title_sort using multiple reference genomes to identify and resolve annotation inconsistencies
topic Annotation
Genome assembly
Maize
Split-gene
url http://link.springer.com/article/10.1186/s12864-020-6696-8
work_keys_str_mv AT patrickjmonnahan usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies
AT jeanmichelmichno usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies
AT christineoconnor usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies
AT alexbbrohammer usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies
AT nathanmspringer usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies
AT suzanneemcgaugh usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies
AT candicenhirsch usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies