HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]
Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome In...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
F1000 Research Ltd
2022-06-01
|
Series: | F1000Research |
Subjects: | |
Online Access: | https://f1000research.com/articles/9-1493/v2 |
_version_ | 1811344732193816576 |
---|---|
author | Marcel Ramos Ragheed Al-Dulaimi Ayush Aggarwal Sean Davis Levi Waldron Sehyun Oh Jasmine Abdelnabi Markus Riester |
author_facet | Marcel Ramos Ragheed Al-Dulaimi Ayush Aggarwal Sean Davis Levi Waldron Sehyun Oh Jasmine Abdelnabi Markus Riester |
author_sort | Marcel Ramos |
collection | DOAJ |
description | Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN. |
first_indexed | 2024-04-13T19:52:29Z |
format | Article |
id | doaj.art-1751749a14ab4423ba9337547cd33f78 |
institution | Directory Open Access Journal |
issn | 2046-1402 |
language | English |
last_indexed | 2024-04-13T19:52:29Z |
publishDate | 2022-06-01 |
publisher | F1000 Research Ltd |
record_format | Article |
series | F1000Research |
spelling | doaj.art-1751749a14ab4423ba9337547cd33f782022-12-22T02:32:28ZengF1000 Research LtdF1000Research2046-14022022-06-019133588HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]Marcel Ramos0Ragheed Al-Dulaimi1Ayush Aggarwal2https://orcid.org/0000-0002-6587-3393Sean Davis3https://orcid.org/0000-0002-8991-6458Levi Waldron4https://orcid.org/0000-0003-2725-0694Sehyun Oh5Jasmine Abdelnabi6Markus Riester7https://orcid.org/0000-0002-4759-8332Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USACSIR-Institute of Genomics and Integrative Biology, New Delhi, 110025, IndiaCenter for Cancer Research, National Cancer Institute, Maryland, 20892, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USANovartis Institutes for BioMedical Research Incorporation, Massachusetts, 02139, USAGene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.https://f1000research.com/articles/9-1493/v2gene symbols molecular biology HGNC MGIeng |
spellingShingle | Marcel Ramos Ragheed Al-Dulaimi Ayush Aggarwal Sean Davis Levi Waldron Sehyun Oh Jasmine Abdelnabi Markus Riester HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved] F1000Research gene symbols molecular biology HGNC MGI eng |
title | HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved] |
title_full | HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved] |
title_fullStr | HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved] |
title_full_unstemmed | HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved] |
title_short | HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved] |
title_sort | hgnchelper identification and correction of invalid gene symbols for human and mouse version 2 peer review 3 approved |
topic | gene symbols molecular biology HGNC MGI eng |
url | https://f1000research.com/articles/9-1493/v2 |
work_keys_str_mv | AT marcelramos hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved AT ragheedaldulaimi hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved AT ayushaggarwal hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved AT seandavis hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved AT leviwaldron hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved AT sehyunoh hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved AT jasmineabdelnabi hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved AT markusriester hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved |