Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA
Background: Research publications related to the novel coronavirus disease COVID-19 are rapidly increasing. However, current online literature hubs, even with artificial intelligence, are limited in identifying the complexity of COVID-19 research topics. We developed a comprehensive Latent Dirichlet...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-07-01
|
Series: | Frontiers in Digital Health |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fdgth.2021.686720/full |
_version_ | 1830342584492359680 |
---|---|
author | Akash Gupta Shrey Aeron Anjali Agrawal Himanshu Gupta |
author_facet | Akash Gupta Shrey Aeron Anjali Agrawal Himanshu Gupta |
author_sort | Akash Gupta |
collection | DOAJ |
description | Background: Research publications related to the novel coronavirus disease COVID-19 are rapidly increasing. However, current online literature hubs, even with artificial intelligence, are limited in identifying the complexity of COVID-19 research topics. We developed a comprehensive Latent Dirichlet Allocation (LDA) model with 25 topics using natural language processing (NLP) techniques on PubMed® research articles about “COVID.” We propose a novel methodology to develop and visualise temporal trends, and improve existing online literature hubs.Our results for temporal evolution demonstrate interesting trends, for example, the prominence of “Mental Health” and “Socioeconomic Impact” increased, “Genome Sequence” decreased, and “Epidemiology” remained relatively constant. Applying our methodology to LitCovid, a literature hub from the National Center for Biotechnology Information, we improved the breadth and depth of research topics by subdividing their pre-existing categories. Our topic model demonstrates that research on “masks” and “Personal Protective Equipment (PPE)” is skewed toward clinical applications with a lack of population-based epidemiological research. |
first_indexed | 2024-12-19T21:53:14Z |
format | Article |
id | doaj.art-73147e469dee4497bc7c966279f200ca |
institution | Directory Open Access Journal |
issn | 2673-253X |
language | English |
last_indexed | 2024-12-19T21:53:14Z |
publishDate | 2021-07-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Digital Health |
spelling | doaj.art-73147e469dee4497bc7c966279f200ca2022-12-21T20:04:20ZengFrontiers Media S.A.Frontiers in Digital Health2673-253X2021-07-01310.3389/fdgth.2021.686720686720Trends in COVID-19 Publications: Streamlining Research Using NLP and LDAAkash Gupta0Shrey Aeron1Anjali Agrawal2Himanshu Gupta3Department of Engineering, University of Cambridge, Cambridge, United KingdomElectrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, United StatesHarmony School of Innovation – Sugar Land (High School), Sugar Land, TX, United StatesValley Health System, Ridgewood, NJ, United StatesBackground: Research publications related to the novel coronavirus disease COVID-19 are rapidly increasing. However, current online literature hubs, even with artificial intelligence, are limited in identifying the complexity of COVID-19 research topics. We developed a comprehensive Latent Dirichlet Allocation (LDA) model with 25 topics using natural language processing (NLP) techniques on PubMed® research articles about “COVID.” We propose a novel methodology to develop and visualise temporal trends, and improve existing online literature hubs.Our results for temporal evolution demonstrate interesting trends, for example, the prominence of “Mental Health” and “Socioeconomic Impact” increased, “Genome Sequence” decreased, and “Epidemiology” remained relatively constant. Applying our methodology to LitCovid, a literature hub from the National Center for Biotechnology Information, we improved the breadth and depth of research topics by subdividing their pre-existing categories. Our topic model demonstrates that research on “masks” and “Personal Protective Equipment (PPE)” is skewed toward clinical applications with a lack of population-based epidemiological research.https://www.frontiersin.org/articles/10.3389/fdgth.2021.686720/fullnatural language processinglatent dirichlet allocationCOVID-19trendsLitCovidtopic model |
spellingShingle | Akash Gupta Shrey Aeron Anjali Agrawal Himanshu Gupta Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA Frontiers in Digital Health natural language processing latent dirichlet allocation COVID-19 trends LitCovid topic model |
title | Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA |
title_full | Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA |
title_fullStr | Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA |
title_full_unstemmed | Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA |
title_short | Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA |
title_sort | trends in covid 19 publications streamlining research using nlp and lda |
topic | natural language processing latent dirichlet allocation COVID-19 trends LitCovid topic model |
url | https://www.frontiersin.org/articles/10.3389/fdgth.2021.686720/full |
work_keys_str_mv | AT akashgupta trendsincovid19publicationsstreamliningresearchusingnlpandlda AT shreyaeron trendsincovid19publicationsstreamliningresearchusingnlpandlda AT anjaliagrawal trendsincovid19publicationsstreamliningresearchusingnlpandlda AT himanshugupta trendsincovid19publicationsstreamliningresearchusingnlpandlda |