HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome In...

Full description

Bibliographic Details
Main Authors: Marcel Ramos, Ragheed Al-Dulaimi, Ayush Aggarwal, Sean Davis, Levi Waldron, Sehyun Oh, Jasmine Abdelnabi, Markus Riester
Format: Article
Language:English
Published: F1000 Research Ltd 2022-06-01
Series:F1000Research
Subjects:
Online Access:https://f1000research.com/articles/9-1493/v2
_version_ 1811344732193816576
author Marcel Ramos
Ragheed Al-Dulaimi
Ayush Aggarwal
Sean Davis
Levi Waldron
Sehyun Oh
Jasmine Abdelnabi
Markus Riester
author_facet Marcel Ramos
Ragheed Al-Dulaimi
Ayush Aggarwal
Sean Davis
Levi Waldron
Sehyun Oh
Jasmine Abdelnabi
Markus Riester
author_sort Marcel Ramos
collection DOAJ
description Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.
first_indexed 2024-04-13T19:52:29Z
format Article
id doaj.art-1751749a14ab4423ba9337547cd33f78
institution Directory Open Access Journal
issn 2046-1402
language English
last_indexed 2024-04-13T19:52:29Z
publishDate 2022-06-01
publisher F1000 Research Ltd
record_format Article
series F1000Research
spelling doaj.art-1751749a14ab4423ba9337547cd33f782022-12-22T02:32:28ZengF1000 Research LtdF1000Research2046-14022022-06-019133588HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]Marcel Ramos0Ragheed Al-Dulaimi1Ayush Aggarwal2https://orcid.org/0000-0002-6587-3393Sean Davis3https://orcid.org/0000-0002-8991-6458Levi Waldron4https://orcid.org/0000-0003-2725-0694Sehyun Oh5Jasmine Abdelnabi6Markus Riester7https://orcid.org/0000-0002-4759-8332Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USACSIR-Institute of Genomics and Integrative Biology, New Delhi, 110025, IndiaCenter for Cancer Research, National Cancer Institute, Maryland, 20892, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USANovartis Institutes for BioMedical Research Incorporation, Massachusetts, 02139, USAGene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.https://f1000research.com/articles/9-1493/v2gene symbols molecular biology HGNC MGIeng
spellingShingle Marcel Ramos
Ragheed Al-Dulaimi
Ayush Aggarwal
Sean Davis
Levi Waldron
Sehyun Oh
Jasmine Abdelnabi
Markus Riester
HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]
F1000Research
gene symbols
molecular biology
HGNC
MGI
eng
title HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]
title_full HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]
title_fullStr HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]
title_full_unstemmed HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]
title_short HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 2; peer review: 3 approved]
title_sort hgnchelper identification and correction of invalid gene symbols for human and mouse version 2 peer review 3 approved
topic gene symbols
molecular biology
HGNC
MGI
eng
url https://f1000research.com/articles/9-1493/v2
work_keys_str_mv AT marcelramos hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved
AT ragheedaldulaimi hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved
AT ayushaggarwal hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved
AT seandavis hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved
AT leviwaldron hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved
AT sehyunoh hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved
AT jasmineabdelnabi hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved
AT markusriester hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion2peerreview3approved