Distributed Machine Learning using HDFS and Apache Spark for Big Data Challenges
Hadoop and Apache Spark have become popular frameworks for distributed big data processing. This research aims to configure Hadoop and Spark for conducting training and testing on big data using distributed machine learning methods with MLlib, including linear regression and multi-linear regression....
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2023-01-01
|
Series: | E3S Web of Conferences |
Online Access: | https://www.e3s-conferences.org/articles/e3sconf/pdf/2023/102/e3sconf_icimece2023_02058.pdf |