Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.

Unsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, bi...

Full description

Bibliographic Details
Main Authors: Leacky Muchene, Wende Safari
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0243208
_version_ 1819023682853404672
author Leacky Muchene
Wende Safari
author_facet Leacky Muchene
Wende Safari
author_sort Leacky Muchene
collection DOAJ
description Unsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, biomedical documents and journalistic topics. We applied a novel two-stage topic modelling approach and illustrated the methodology with data from a collection of published abstracts from the University of Nairobi, Kenya. In the first stage, topic modelling with Latent Dirichlet Allocation was applied to derive the per-document topic probabilities. To more succinctly present the topics, in the second stage, hierarchical clustering with Hellinger distance was applied to derive the final clusters of topics. The analysis showed that dominant research themes in the university include: HIV and malaria research, research on agricultural and veterinary services as well as cross-cutting themes in humanities and social sciences. Further, the use of hierarchical clustering in the second stage reduces the discovered latent topics to clusters of homogeneous topics.
first_indexed 2024-12-21T04:42:47Z
format Article
id doaj.art-868fb483a9074caabcf8e4ee5d294f8d
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-21T04:42:47Z
publishDate 2021-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-868fb483a9074caabcf8e4ee5d294f8d2022-12-21T19:15:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-01161e024320810.1371/journal.pone.0243208Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.Leacky MucheneWende SafariUnsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, biomedical documents and journalistic topics. We applied a novel two-stage topic modelling approach and illustrated the methodology with data from a collection of published abstracts from the University of Nairobi, Kenya. In the first stage, topic modelling with Latent Dirichlet Allocation was applied to derive the per-document topic probabilities. To more succinctly present the topics, in the second stage, hierarchical clustering with Hellinger distance was applied to derive the final clusters of topics. The analysis showed that dominant research themes in the university include: HIV and malaria research, research on agricultural and veterinary services as well as cross-cutting themes in humanities and social sciences. Further, the use of hierarchical clustering in the second stage reduces the discovered latent topics to clusters of homogeneous topics.https://doi.org/10.1371/journal.pone.0243208
spellingShingle Leacky Muchene
Wende Safari
Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.
PLoS ONE
title Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.
title_full Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.
title_fullStr Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.
title_full_unstemmed Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.
title_short Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya.
title_sort two stage topic modelling of scientific publications a case study of university of nairobi kenya
url https://doi.org/10.1371/journal.pone.0243208
work_keys_str_mv AT leackymuchene twostagetopicmodellingofscientificpublicationsacasestudyofuniversityofnairobikenya
AT wendesafari twostagetopicmodellingofscientificpublicationsacasestudyofuniversityofnairobikenya