Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA

Background: Research publications related to the novel coronavirus disease COVID-19 are rapidly increasing. However, current online literature hubs, even with artificial intelligence, are limited in identifying the complexity of COVID-19 research topics. We developed a comprehensive Latent Dirichlet...

Full description

Bibliographic Details
Main Authors: Akash Gupta, Shrey Aeron, Anjali Agrawal, Himanshu Gupta
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-07-01
Series:Frontiers in Digital Health
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdgth.2021.686720/full
_version_ 1830342584492359680
author Akash Gupta
Shrey Aeron
Anjali Agrawal
Himanshu Gupta
author_facet Akash Gupta
Shrey Aeron
Anjali Agrawal
Himanshu Gupta
author_sort Akash Gupta
collection DOAJ
description Background: Research publications related to the novel coronavirus disease COVID-19 are rapidly increasing. However, current online literature hubs, even with artificial intelligence, are limited in identifying the complexity of COVID-19 research topics. We developed a comprehensive Latent Dirichlet Allocation (LDA) model with 25 topics using natural language processing (NLP) techniques on PubMed® research articles about “COVID.” We propose a novel methodology to develop and visualise temporal trends, and improve existing online literature hubs.Our results for temporal evolution demonstrate interesting trends, for example, the prominence of “Mental Health” and “Socioeconomic Impact” increased, “Genome Sequence” decreased, and “Epidemiology” remained relatively constant. Applying our methodology to LitCovid, a literature hub from the National Center for Biotechnology Information, we improved the breadth and depth of research topics by subdividing their pre-existing categories. Our topic model demonstrates that research on “masks” and “Personal Protective Equipment (PPE)” is skewed toward clinical applications with a lack of population-based epidemiological research.
first_indexed 2024-12-19T21:53:14Z
format Article
id doaj.art-73147e469dee4497bc7c966279f200ca
institution Directory Open Access Journal
issn 2673-253X
language English
last_indexed 2024-12-19T21:53:14Z
publishDate 2021-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Digital Health
spelling doaj.art-73147e469dee4497bc7c966279f200ca2022-12-21T20:04:20ZengFrontiers Media S.A.Frontiers in Digital Health2673-253X2021-07-01310.3389/fdgth.2021.686720686720Trends in COVID-19 Publications: Streamlining Research Using NLP and LDAAkash Gupta0Shrey Aeron1Anjali Agrawal2Himanshu Gupta3Department of Engineering, University of Cambridge, Cambridge, United KingdomElectrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, United StatesHarmony School of Innovation – Sugar Land (High School), Sugar Land, TX, United StatesValley Health System, Ridgewood, NJ, United StatesBackground: Research publications related to the novel coronavirus disease COVID-19 are rapidly increasing. However, current online literature hubs, even with artificial intelligence, are limited in identifying the complexity of COVID-19 research topics. We developed a comprehensive Latent Dirichlet Allocation (LDA) model with 25 topics using natural language processing (NLP) techniques on PubMed® research articles about “COVID.” We propose a novel methodology to develop and visualise temporal trends, and improve existing online literature hubs.Our results for temporal evolution demonstrate interesting trends, for example, the prominence of “Mental Health” and “Socioeconomic Impact” increased, “Genome Sequence” decreased, and “Epidemiology” remained relatively constant. Applying our methodology to LitCovid, a literature hub from the National Center for Biotechnology Information, we improved the breadth and depth of research topics by subdividing their pre-existing categories. Our topic model demonstrates that research on “masks” and “Personal Protective Equipment (PPE)” is skewed toward clinical applications with a lack of population-based epidemiological research.https://www.frontiersin.org/articles/10.3389/fdgth.2021.686720/fullnatural language processinglatent dirichlet allocationCOVID-19trendsLitCovidtopic model
spellingShingle Akash Gupta
Shrey Aeron
Anjali Agrawal
Himanshu Gupta
Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA
Frontiers in Digital Health
natural language processing
latent dirichlet allocation
COVID-19
trends
LitCovid
topic model
title Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA
title_full Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA
title_fullStr Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA
title_full_unstemmed Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA
title_short Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA
title_sort trends in covid 19 publications streamlining research using nlp and lda
topic natural language processing
latent dirichlet allocation
COVID-19
trends
LitCovid
topic model
url https://www.frontiersin.org/articles/10.3389/fdgth.2021.686720/full
work_keys_str_mv AT akashgupta trendsincovid19publicationsstreamliningresearchusingnlpandlda
AT shreyaeron trendsincovid19publicationsstreamliningresearchusingnlpandlda
AT anjaliagrawal trendsincovid19publicationsstreamliningresearchusingnlpandlda
AT himanshugupta trendsincovid19publicationsstreamliningresearchusingnlpandlda