Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN

Topic Detection and Tracking technique (TDT) has been commonly used to identify the hot topics from the huge volume of Internet news information and keep up with the hot news. However, traditional topic detection and tracking methods have shown low accuracy and low efficiency. In this paper, a topic...

Full description

Bibliographic Details
Main Authors: Chuanzhen Li, Minqiao Liu, Juanjuan Cai, Yang Yu, Hui Wang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9308948/
_version_ 1819180850277777408
author Chuanzhen Li
Minqiao Liu
Juanjuan Cai
Yang Yu
Hui Wang
author_facet Chuanzhen Li
Minqiao Liu
Juanjuan Cai
Yang Yu
Hui Wang
author_sort Chuanzhen Li
collection DOAJ
description Topic Detection and Tracking technique (TDT) has been commonly used to identify the hot topics from the huge volume of Internet news information and keep up with the hot news. However, traditional topic detection and tracking methods have shown low accuracy and low efficiency. In this paper, a topic detection system driven by big data is built on the Spark platform, which aims at improving the efficiency of news collecting from the Internet and improving the accuracy and efficiency of topic detection and tracking tasks. This system can be easily employed in a distributed architecture and work as a parallelized news collecting and topic detection system. An improved density-based spatial clustering of application with noise (DBSCAN) clustering algorithm based on the time window is proposed to achieve accurate topic detection with the auxiliary advantage of reducing the time complexity. A parallel KNN based topic tracking algorithm is proposed for the topic tracking task. Experiments including comparison with some baseline algorithms and quantitative and qualitative analyses are conducted on pseudo-distributed Spark platform, which demonstrates the effectiveness and efficiency of the parallelized topic detection system.
first_indexed 2024-12-22T22:20:53Z
format Article
id doaj.art-dcdc68d411dd47f9a744d0d387b524a9
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T22:20:53Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-dcdc68d411dd47f9a744d0d387b524a92022-12-21T18:10:40ZengIEEEIEEE Access2169-35362021-01-0193858387010.1109/ACCESS.2020.30474589308948Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNNChuanzhen Li0Minqiao Liu1https://orcid.org/0000-0003-0897-6510Juanjuan Cai2Yang Yu3Hui Wang4School of Information and Communication Engineering, Communication University of China, Beijing, ChinaSchool of Information and Communication Engineering, Communication University of China, Beijing, ChinaKey Laboratory of Media Audio and Video (Communication University of China), Ministry of Education, Communication University of China, Beijing, ChinaIQIYI Inc., Beijing, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, ChinaTopic Detection and Tracking technique (TDT) has been commonly used to identify the hot topics from the huge volume of Internet news information and keep up with the hot news. However, traditional topic detection and tracking methods have shown low accuracy and low efficiency. In this paper, a topic detection system driven by big data is built on the Spark platform, which aims at improving the efficiency of news collecting from the Internet and improving the accuracy and efficiency of topic detection and tracking tasks. This system can be easily employed in a distributed architecture and work as a parallelized news collecting and topic detection system. An improved density-based spatial clustering of application with noise (DBSCAN) clustering algorithm based on the time window is proposed to achieve accurate topic detection with the auxiliary advantage of reducing the time complexity. A parallel KNN based topic tracking algorithm is proposed for the topic tracking task. Experiments including comparison with some baseline algorithms and quantitative and qualitative analyses are conducted on pseudo-distributed Spark platform, which demonstrates the effectiveness and efficiency of the parallelized topic detection system.https://ieeexplore.ieee.org/document/9308948/Big dataDBSCANparallelizedTDT
spellingShingle Chuanzhen Li
Minqiao Liu
Juanjuan Cai
Yang Yu
Hui Wang
Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN
IEEE Access
Big data
DBSCAN
parallelized
TDT
title Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN
title_full Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN
title_fullStr Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN
title_full_unstemmed Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN
title_short Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN
title_sort topic detection and tracking based on windowed dbscan and parallel knn
topic Big data
DBSCAN
parallelized
TDT
url https://ieeexplore.ieee.org/document/9308948/
work_keys_str_mv AT chuanzhenli topicdetectionandtrackingbasedonwindoweddbscanandparallelknn
AT minqiaoliu topicdetectionandtrackingbasedonwindoweddbscanandparallelknn
AT juanjuancai topicdetectionandtrackingbasedonwindoweddbscanandparallelknn
AT yangyu topicdetectionandtrackingbasedonwindoweddbscanandparallelknn
AT huiwang topicdetectionandtrackingbasedonwindoweddbscanandparallelknn