Distributed storage optimization using multi-agent systems in Hadoop

Understanding data and extracting information from it are the main objectives of data science, especially when it comes to big data. To achieve these goals, it is necessary to collect and process massive data sets, arriving at the system in different formats at great velocity. The Big Data era has b...

Full description

Bibliographic Details
Main Authors: Sais Manar, Rafalia Najat, Mahdaoui Rabie, Abouchabaka Jaafar
Format: Article
Language:English
Published: EDP Sciences 2023-01-01
Series:E3S Web of Conferences
Subjects:
Online Access:https://www.e3s-conferences.org/articles/e3sconf/pdf/2023/49/e3sconf_icies2023_01091.pdf
_version_ 1797737695808585728
author Sais Manar
Rafalia Najat
Mahdaoui Rabie
Abouchabaka Jaafar
author_facet Sais Manar
Rafalia Najat
Mahdaoui Rabie
Abouchabaka Jaafar
author_sort Sais Manar
collection DOAJ
description Understanding data and extracting information from it are the main objectives of data science, especially when it comes to big data. To achieve these goals, it is necessary to collect and process massive data sets, arriving at the system in different formats at great velocity. The Big Data era has brought us new challenges in data storage and management, and existing state-ofthe-art data storage and processing tools are poised to meet the challenges while posing challenges to the next generation of data. Big Data storage optimization is essential for improving the overall efficiency of Big Data systems by maximizing the use of storage resources. It also reduces the energy consumption of Big Data systems, resulting in financial savings, environmental protection, and improved system performance. Hadoop provides a solution for storing and analysing large quantities of data. However, Hadoop can encounter storage management problems due to its distributed nature and the management of large volumes of data. In order to meet future challenges, the system needs to intelligently manage its storage system. The use of a multi-agent system presents a promising approach for efficiently managing hot and cold data in HDFS. These systems offer a flexible, distributed solution for solving complex problems. This work proposes an approach based on a multi-agent system capable of gathering information on data access activity in the HDFS cluster. Using this information, it classifies data according to its temperature (hot or cold) and makes decisions about data replication based on its classification. In addition, it compresses unused data to manage resources efficiently and reduce storage space usage.
first_indexed 2024-03-12T13:32:09Z
format Article
id doaj.art-25ff477c6d384a1ab76c4a5a5faf90ea
institution Directory Open Access Journal
issn 2267-1242
language English
last_indexed 2024-03-12T13:32:09Z
publishDate 2023-01-01
publisher EDP Sciences
record_format Article
series E3S Web of Conferences
spelling doaj.art-25ff477c6d384a1ab76c4a5a5faf90ea2023-08-24T08:21:06ZengEDP SciencesE3S Web of Conferences2267-12422023-01-014120109110.1051/e3sconf/202341201091e3sconf_icies2023_01091Distributed storage optimization using multi-agent systems in HadoopSais Manar0Rafalia Najat1Mahdaoui Rabie2Abouchabaka Jaafar3Department of Computer Science, Computer Research Laboratory LaRI, Faculty of Sciences, Ibn Tofail UniversityDepartment of Computer Science, Computer Research Laboratory LaRI, Faculty of Sciences, Ibn Tofail UniversityDepartment of Computer Science, Computer Research Laboratory LaRI, Faculty of Sciences, Ibn Tofail UniversityDepartment of Computer Science, Computer Research Laboratory LaRI, Faculty of Sciences, Ibn Tofail UniversityUnderstanding data and extracting information from it are the main objectives of data science, especially when it comes to big data. To achieve these goals, it is necessary to collect and process massive data sets, arriving at the system in different formats at great velocity. The Big Data era has brought us new challenges in data storage and management, and existing state-ofthe-art data storage and processing tools are poised to meet the challenges while posing challenges to the next generation of data. Big Data storage optimization is essential for improving the overall efficiency of Big Data systems by maximizing the use of storage resources. It also reduces the energy consumption of Big Data systems, resulting in financial savings, environmental protection, and improved system performance. Hadoop provides a solution for storing and analysing large quantities of data. However, Hadoop can encounter storage management problems due to its distributed nature and the management of large volumes of data. In order to meet future challenges, the system needs to intelligently manage its storage system. The use of a multi-agent system presents a promising approach for efficiently managing hot and cold data in HDFS. These systems offer a flexible, distributed solution for solving complex problems. This work proposes an approach based on a multi-agent system capable of gathering information on data access activity in the HDFS cluster. Using this information, it classifies data according to its temperature (hot or cold) and makes decisions about data replication based on its classification. In addition, it compresses unused data to manage resources efficiently and reduce storage space usage.https://www.e3s-conferences.org/articles/e3sconf/pdf/2023/49/e3sconf_icies2023_01091.pdfbig dataenergy consumptionenvironmental protectionstoragehadoophdfsmulti-agent system
spellingShingle Sais Manar
Rafalia Najat
Mahdaoui Rabie
Abouchabaka Jaafar
Distributed storage optimization using multi-agent systems in Hadoop
E3S Web of Conferences
big data
energy consumption
environmental protection
storage
hadoop
hdfs
multi-agent system
title Distributed storage optimization using multi-agent systems in Hadoop
title_full Distributed storage optimization using multi-agent systems in Hadoop
title_fullStr Distributed storage optimization using multi-agent systems in Hadoop
title_full_unstemmed Distributed storage optimization using multi-agent systems in Hadoop
title_short Distributed storage optimization using multi-agent systems in Hadoop
title_sort distributed storage optimization using multi agent systems in hadoop
topic big data
energy consumption
environmental protection
storage
hadoop
hdfs
multi-agent system
url https://www.e3s-conferences.org/articles/e3sconf/pdf/2023/49/e3sconf_icies2023_01091.pdf
work_keys_str_mv AT saismanar distributedstorageoptimizationusingmultiagentsystemsinhadoop
AT rafalianajat distributedstorageoptimizationusingmultiagentsystemsinhadoop
AT mahdaouirabie distributedstorageoptimizationusingmultiagentsystemsinhadoop
AT abouchabakajaafar distributedstorageoptimizationusingmultiagentsystemsinhadoop