Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake

The rapid development of information has made online news increasingly needed. Online news attracts readers' attention by providing convenience and speed in presenting news from various fields. However, the large amount (volume) of online news that spreads in a short time (velocity) and the pub...

Full description

Bibliographic Details
Main Author: Angelina Pramana Thenata
Format: Article
Language:English
Published: Program Studi Teknik Informatika Universitas Trilogi 2020-09-01
Series:JISA (Jurnal Informatika dan Sains)
Subjects:
Online Access:https://trilogi.ac.id/journal/ks/index.php/JISA/article/view/657
_version_ 1811264469948432384
author Angelina Pramana Thenata
author_facet Angelina Pramana Thenata
author_sort Angelina Pramana Thenata
collection DOAJ
description The rapid development of information has made online news increasingly needed. Online news attracts readers' attention by providing convenience and speed in presenting news from various fields. However, the large amount (volume) of online news that spreads in a short time (velocity) and the public's need to consume news in various references (variety) can affect people's lives. Therefore, the government as the regulator and news agencies need to monitor online news circulating. Based on these problems, the researcher proposes a data lake architectural design that is suitable for online news and can run in real-time. Data lakes can solve the main problems of Big Data (volume, velocity, variety). In proposing this data lake architecture, the researcher conducted a literature study and analyzed the flow of the data lake architecture according to online news. Furthermore, the researcher will use this architecture to combine and uniform the online news data structure from several online news channels and then stream it in real-time to fill the data lake. The results of using the data lake architecture for online news will be stored on MongoDB which functions as a database to store all data for both the short and long term. Finally, this data lake will be a means to accommodate, dive into, and analyze the circulating online news data. Keywords – Data Lake, Online News, Real-Time
first_indexed 2024-04-12T20:04:49Z
format Article
id doaj.art-af044013a0b04774b635131d24900640
institution Directory Open Access Journal
issn 2776-3234
2614-8404
language English
last_indexed 2024-04-12T20:04:49Z
publishDate 2020-09-01
publisher Program Studi Teknik Informatika Universitas Trilogi
record_format Article
series JISA (Jurnal Informatika dan Sains)
spelling doaj.art-af044013a0b04774b635131d249006402022-12-22T03:18:26ZengProgram Studi Teknik Informatika Universitas TrilogiJISA (Jurnal Informatika dan Sains)2776-32342614-84042020-09-0131323710.31326/jisa.v3i1.657439Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data LakeAngelina Pramana Thenata0Universitas Bunda MuliaThe rapid development of information has made online news increasingly needed. Online news attracts readers' attention by providing convenience and speed in presenting news from various fields. However, the large amount (volume) of online news that spreads in a short time (velocity) and the public's need to consume news in various references (variety) can affect people's lives. Therefore, the government as the regulator and news agencies need to monitor online news circulating. Based on these problems, the researcher proposes a data lake architectural design that is suitable for online news and can run in real-time. Data lakes can solve the main problems of Big Data (volume, velocity, variety). In proposing this data lake architecture, the researcher conducted a literature study and analyzed the flow of the data lake architecture according to online news. Furthermore, the researcher will use this architecture to combine and uniform the online news data structure from several online news channels and then stream it in real-time to fill the data lake. The results of using the data lake architecture for online news will be stored on MongoDB which functions as a database to store all data for both the short and long term. Finally, this data lake will be a means to accommodate, dive into, and analyze the circulating online news data. Keywords – Data Lake, Online News, Real-Timehttps://trilogi.ac.id/journal/ks/index.php/JISA/article/view/657data lakeonline newsreal-time
spellingShingle Angelina Pramana Thenata
Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake
JISA (Jurnal Informatika dan Sains)
data lake
online news
real-time
title Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake
title_full Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake
title_fullStr Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake
title_full_unstemmed Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake
title_short Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake
title_sort data pipeline architecture with near real time streaming multiple source indonesian online news data lake
topic data lake
online news
real-time
url https://trilogi.ac.id/journal/ks/index.php/JISA/article/view/657
work_keys_str_mv AT angelinapramanathenata datapipelinearchitecturewithnearrealtimestreamingmultiplesourceindonesianonlinenewsdatalake