Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake
The rapid development of information has made online news increasingly needed. Online news attracts readers' attention by providing convenience and speed in presenting news from various fields. However, the large amount (volume) of online news that spreads in a short time (velocity) and the pub...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Program Studi Teknik Informatika Universitas Trilogi
2020-09-01
|
Series: | JISA (Jurnal Informatika dan Sains) |
Subjects: | |
Online Access: | https://trilogi.ac.id/journal/ks/index.php/JISA/article/view/657 |
_version_ | 1811264469948432384 |
---|---|
author | Angelina Pramana Thenata |
author_facet | Angelina Pramana Thenata |
author_sort | Angelina Pramana Thenata |
collection | DOAJ |
description | The rapid development of information has made online news increasingly needed. Online news attracts readers' attention by providing convenience and speed in presenting news from various fields. However, the large amount (volume) of online news that spreads in a short time (velocity) and the public's need to consume news in various references (variety) can affect people's lives. Therefore, the government as the regulator and news agencies need to monitor online news circulating. Based on these problems, the researcher proposes a data lake architectural design that is suitable for online news and can run in real-time. Data lakes can solve the main problems of Big Data (volume, velocity, variety). In proposing this data lake architecture, the researcher conducted a literature study and analyzed the flow of the data lake architecture according to online news. Furthermore, the researcher will use this architecture to combine and uniform the online news data structure from several online news channels and then stream it in real-time to fill the data lake. The results of using the data lake architecture for online news will be stored on MongoDB which functions as a database to store all data for both the short and long term. Finally, this data lake will be a means to accommodate, dive into, and analyze the circulating online news data. Keywords – Data Lake, Online News, Real-Time |
first_indexed | 2024-04-12T20:04:49Z |
format | Article |
id | doaj.art-af044013a0b04774b635131d24900640 |
institution | Directory Open Access Journal |
issn | 2776-3234 2614-8404 |
language | English |
last_indexed | 2024-04-12T20:04:49Z |
publishDate | 2020-09-01 |
publisher | Program Studi Teknik Informatika Universitas Trilogi |
record_format | Article |
series | JISA (Jurnal Informatika dan Sains) |
spelling | doaj.art-af044013a0b04774b635131d249006402022-12-22T03:18:26ZengProgram Studi Teknik Informatika Universitas TrilogiJISA (Jurnal Informatika dan Sains)2776-32342614-84042020-09-0131323710.31326/jisa.v3i1.657439Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data LakeAngelina Pramana Thenata0Universitas Bunda MuliaThe rapid development of information has made online news increasingly needed. Online news attracts readers' attention by providing convenience and speed in presenting news from various fields. However, the large amount (volume) of online news that spreads in a short time (velocity) and the public's need to consume news in various references (variety) can affect people's lives. Therefore, the government as the regulator and news agencies need to monitor online news circulating. Based on these problems, the researcher proposes a data lake architectural design that is suitable for online news and can run in real-time. Data lakes can solve the main problems of Big Data (volume, velocity, variety). In proposing this data lake architecture, the researcher conducted a literature study and analyzed the flow of the data lake architecture according to online news. Furthermore, the researcher will use this architecture to combine and uniform the online news data structure from several online news channels and then stream it in real-time to fill the data lake. The results of using the data lake architecture for online news will be stored on MongoDB which functions as a database to store all data for both the short and long term. Finally, this data lake will be a means to accommodate, dive into, and analyze the circulating online news data. Keywords – Data Lake, Online News, Real-Timehttps://trilogi.ac.id/journal/ks/index.php/JISA/article/view/657data lakeonline newsreal-time |
spellingShingle | Angelina Pramana Thenata Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake JISA (Jurnal Informatika dan Sains) data lake online news real-time |
title | Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake |
title_full | Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake |
title_fullStr | Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake |
title_full_unstemmed | Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake |
title_short | Data Pipeline Architecture with Near Real-Time Streaming Multiple Source Indonesian Online News Data Lake |
title_sort | data pipeline architecture with near real time streaming multiple source indonesian online news data lake |
topic | data lake online news real-time |
url | https://trilogi.ac.id/journal/ks/index.php/JISA/article/view/657 |
work_keys_str_mv | AT angelinapramanathenata datapipelinearchitecturewithnearrealtimestreamingmultiplesourceindonesianonlinenewsdatalake |