ADD: Academic Disciplines Detector Based on Wikipedia

The academic disciplines and their interrelationships represent a backbone that organizes the enormous amount of documented human knowledge available today. Having an up-to-date overview of the established disciplines, the emerging ones, and their mutual interactions is essential to the academic ins...

Full description

Bibliographic Details
Main Authors: Ana Gjorgjevikj, Kostadin Mishev, Dimitar Trajanov
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8948031/
_version_ 1818875909974786048
author Ana Gjorgjevikj
Kostadin Mishev
Dimitar Trajanov
author_facet Ana Gjorgjevikj
Kostadin Mishev
Dimitar Trajanov
author_sort Ana Gjorgjevikj
collection DOAJ
description The academic disciplines and their interrelationships represent a backbone that organizes the enormous amount of documented human knowledge available today. Having an up-to-date overview of the established disciplines, the emerging ones, and their mutual interactions is essential to the academic institutions, publishers, and many other actors involved in today's knowledge-based society, even in a situation of nonexistence of a precise definition of the term “academic discipline” itself. The discipline classification schemes represent crucial resources for the purpose, and in circumstances where the knowledge production rate demands discovering changes in their structure very frequently, the data-driven methodologies which facilitate their revision processes become essential. Analyzing the world-wide community's opinion on what represents a discipline, available through Wikipedia, can be very informative for the purpose, considering Wikipedia's comprehensiveness, continuous updates, and historical exports availability. This paper proposes a data-driven methodology for identification of the concepts which the world-wide community defines as disciplines at a particular moment by analyzing the information available in Wikipedia at that same moment. At the same time, it discusses Wikipedia's strengths and challenges on the task while also comparing a variety of Machine Learning and Natural Language Processing methodologies. High accuracy of the trained models is achieved on datasets created for this task specifically, and low changes in the model accuracy are observed on four Wikipedia exports from 2015 to 2018.
first_indexed 2024-12-19T13:34:00Z
format Article
id doaj.art-465586320f404b3282be393e52af4b5c
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T13:34:00Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-465586320f404b3282be393e52af4b5c2022-12-21T20:19:16ZengIEEEIEEE Access2169-35362020-01-0187005701910.1109/ACCESS.2019.29636748948031ADD: Academic Disciplines Detector Based on WikipediaAna Gjorgjevikj0https://orcid.org/0000-0002-5135-7718Kostadin Mishev1https://orcid.org/0000-0003-3982-3330Dimitar Trajanov2https://orcid.org/0000-0002-3105-6010Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Skopje, MacedoniaFaculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Skopje, MacedoniaFaculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Skopje, MacedoniaThe academic disciplines and their interrelationships represent a backbone that organizes the enormous amount of documented human knowledge available today. Having an up-to-date overview of the established disciplines, the emerging ones, and their mutual interactions is essential to the academic institutions, publishers, and many other actors involved in today's knowledge-based society, even in a situation of nonexistence of a precise definition of the term “academic discipline” itself. The discipline classification schemes represent crucial resources for the purpose, and in circumstances where the knowledge production rate demands discovering changes in their structure very frequently, the data-driven methodologies which facilitate their revision processes become essential. Analyzing the world-wide community's opinion on what represents a discipline, available through Wikipedia, can be very informative for the purpose, considering Wikipedia's comprehensiveness, continuous updates, and historical exports availability. This paper proposes a data-driven methodology for identification of the concepts which the world-wide community defines as disciplines at a particular moment by analyzing the information available in Wikipedia at that same moment. At the same time, it discusses Wikipedia's strengths and challenges on the task while also comparing a variety of Machine Learning and Natural Language Processing methodologies. High accuracy of the trained models is achieved on datasets created for this task specifically, and low changes in the model accuracy are observed on four Wikipedia exports from 2015 to 2018.https://ieeexplore.ieee.org/document/8948031/Machine learning algorithmsnatural language processingacademic disciplinetext analysisWikipedia
spellingShingle Ana Gjorgjevikj
Kostadin Mishev
Dimitar Trajanov
ADD: Academic Disciplines Detector Based on Wikipedia
IEEE Access
Machine learning algorithms
natural language processing
academic discipline
text analysis
Wikipedia
title ADD: Academic Disciplines Detector Based on Wikipedia
title_full ADD: Academic Disciplines Detector Based on Wikipedia
title_fullStr ADD: Academic Disciplines Detector Based on Wikipedia
title_full_unstemmed ADD: Academic Disciplines Detector Based on Wikipedia
title_short ADD: Academic Disciplines Detector Based on Wikipedia
title_sort add academic disciplines detector based on wikipedia
topic Machine learning algorithms
natural language processing
academic discipline
text analysis
Wikipedia
url https://ieeexplore.ieee.org/document/8948031/
work_keys_str_mv AT anagjorgjevikj addacademicdisciplinesdetectorbasedonwikipedia
AT kostadinmishev addacademicdisciplinesdetectorbasedonwikipedia
AT dimitartrajanov addacademicdisciplinesdetectorbasedonwikipedia