Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseases

Abstract Background Tremendous research efforts have been made in the Alzheimer’s disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. R...

Full description

Bibliographic Details
Main Authors: Jiannan Liu, Huanmei Wu, Daniel H. Robertson, Jie Zhang
Format: Article
Language:English
Published: BMC 2024-04-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-024-02501-7
_version_ 1797199436744491008
author Jiannan Liu
Huanmei Wu
Daniel H. Robertson
Jie Zhang
author_facet Jiannan Liu
Huanmei Wu
Daniel H. Robertson
Jie Zhang
author_sort Jiannan Liu
collection DOAJ
description Abstract Background Tremendous research efforts have been made in the Alzheimer’s disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. Reviewing previous work and staying current on this ever-growing body of AD publications is an essential yet difficult task for AD researchers. Methods In this study, we designed and implemented a natural language processing (NLP) pipeline to extract gene-specific neurodegenerative disease (ND) -focused information from the PubMed database. The collected publication information was filtered and cleaned to construct AD-related gene-specific publication profiles. Six categories of AD-related information are extracted from the processed publication data: publication trend by year, dementia type occurrence, brain region occurrence, mouse model information, keywords occurrence, and co-occurring genes. A user-friendly web portal is then developed using Django framework to provide gene query functions and data visualizations for the generalized and summarized publication information. Results By implementing the NLP pipeline, we extracted gene-specific ND-related publication information from the abstracts of the publications in the PubMed database. The results are summarized and visualized through an interactive web query portal. Multiple visualization windows display the ND publication trends, mouse models used, dementia types, involved brain regions, keywords to major AD-related biological processes, and co-occurring genes. Direct links to PubMed sites are provided for all recorded publications on the query result page of the web portal. Conclusion The resulting portal is a valuable tool and data source for quick querying and displaying AD publications tailored to users’ interested research areas and gene targets, which is especially convenient for users without informatic mining skills. Our study will not only keep AD field researchers updated with the progress of AD research, assist them in conducting preliminary examinations efficiently, but also offers additional support for hypothesis generation and validation which will contribute significantly to the communication, dissemination, and progress of AD research.
first_indexed 2024-04-24T07:15:44Z
format Article
id doaj.art-e9b137248802406383495a23a70b3fe3
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-04-24T07:15:44Z
publishDate 2024-04-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-e9b137248802406383495a23a70b3fe32024-04-21T11:21:07ZengBMCBMC Medical Informatics and Decision Making1472-69472024-04-0124S31910.1186/s12911-024-02501-7Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseasesJiannan Liu0Huanmei Wu1Daniel H. Robertson2Jie Zhang3Department of BioHealth Informatics, Indiana University School of Informatics & ComputingDepartment of BioHealth Informatics, Indiana University School of Informatics & ComputingIntegrated Data Sciences, Indiana Biosciences Research InstituteDept of Medical and Molecular Genetics, Indiana University School of MedicineAbstract Background Tremendous research efforts have been made in the Alzheimer’s disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. Reviewing previous work and staying current on this ever-growing body of AD publications is an essential yet difficult task for AD researchers. Methods In this study, we designed and implemented a natural language processing (NLP) pipeline to extract gene-specific neurodegenerative disease (ND) -focused information from the PubMed database. The collected publication information was filtered and cleaned to construct AD-related gene-specific publication profiles. Six categories of AD-related information are extracted from the processed publication data: publication trend by year, dementia type occurrence, brain region occurrence, mouse model information, keywords occurrence, and co-occurring genes. A user-friendly web portal is then developed using Django framework to provide gene query functions and data visualizations for the generalized and summarized publication information. Results By implementing the NLP pipeline, we extracted gene-specific ND-related publication information from the abstracts of the publications in the PubMed database. The results are summarized and visualized through an interactive web query portal. Multiple visualization windows display the ND publication trends, mouse models used, dementia types, involved brain regions, keywords to major AD-related biological processes, and co-occurring genes. Direct links to PubMed sites are provided for all recorded publications on the query result page of the web portal. Conclusion The resulting portal is a valuable tool and data source for quick querying and displaying AD publications tailored to users’ interested research areas and gene targets, which is especially convenient for users without informatic mining skills. Our study will not only keep AD field researchers updated with the progress of AD research, assist them in conducting preliminary examinations efficiently, but also offers additional support for hypothesis generation and validation which will contribute significantly to the communication, dissemination, and progress of AD research.https://doi.org/10.1186/s12911-024-02501-7Alzheimer’s diseaseText miningNatural language processingWeb portal
spellingShingle Jiannan Liu
Huanmei Wu
Daniel H. Robertson
Jie Zhang
Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseases
BMC Medical Informatics and Decision Making
Alzheimer’s disease
Text mining
Natural language processing
Web portal
title Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseases
title_full Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseases
title_fullStr Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseases
title_full_unstemmed Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseases
title_short Text mining and portal development for gene-specific publications on Alzheimer’s disease and other neurodegenerative diseases
title_sort text mining and portal development for gene specific publications on alzheimer s disease and other neurodegenerative diseases
topic Alzheimer’s disease
Text mining
Natural language processing
Web portal
url https://doi.org/10.1186/s12911-024-02501-7
work_keys_str_mv AT jiannanliu textminingandportaldevelopmentforgenespecificpublicationsonalzheimersdiseaseandotherneurodegenerativediseases
AT huanmeiwu textminingandportaldevelopmentforgenespecificpublicationsonalzheimersdiseaseandotherneurodegenerativediseases
AT danielhrobertson textminingandportaldevelopmentforgenespecificpublicationsonalzheimersdiseaseandotherneurodegenerativediseases
AT jiezhang textminingandportaldevelopmentforgenespecificpublicationsonalzheimersdiseaseandotherneurodegenerativediseases