Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market

The development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a pri...

Full description

Bibliographic Details
Main Authors: İhsan Hakan Selvi, Seda Yılmaz
Format: Article
Language:English
Published: Sakarya University 2023-08-01
Series:Sakarya University Journal of Computer and Information Sciences
Subjects:
Online Access:https://dergipark.org.tr/tr/download/article-file/3185615
_version_ 1797351839235047424
author İhsan Hakan Selvi
Seda Yılmaz
author_facet İhsan Hakan Selvi
Seda Yılmaz
author_sort İhsan Hakan Selvi
collection DOAJ
description The development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a price estimation model. The data needed for analysis was collected using Selenium and BeautifulSoup and prepared for analysis by applying various data preprocessing steps. Lasso regression and PCA analysis were used for feature selection and size reduction, and the GridSearchCV method was used for hyperparameter tuning. The results were evaluated with machine learning algorithms. Random Forest, K-Nearest Neighbor, Gradient Boost, AdaBoost, Support Vector and XGBoost regression algorithms were used in the analysis. The obtained analysis results were evaluated together with Mean Square Error (MSE), Root Mean Square Error (RMSE) and Coefficient of Determination (R-square). When the results for data set 1 were examined, the model that gave the best results was XGBoost Regression with 0.973 R2, 0.026 MSE and 0.161 RMSE values. When the results for data set 2 were examined, the model that gave the best results was K-Nearest Neighbor Regression with 0.978 R2, 0.021 MSE and 0.145 RMSE values.
first_indexed 2024-03-08T13:06:19Z
format Article
id doaj.art-90aad749a0ec4259ad9d8efafd554619
institution Directory Open Access Journal
issn 2636-8129
language English
last_indexed 2024-03-08T13:06:19Z
publishDate 2023-08-01
publisher Sakarya University
record_format Article
series Sakarya University Journal of Computer and Information Sciences
spelling doaj.art-90aad749a0ec4259ad9d8efafd5546192024-01-18T16:44:35ZengSakarya UniversitySakarya University Journal of Computer and Information Sciences2636-81292023-08-016214014810.35377/saucis...130910328Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Marketİhsan Hakan Selvi0Seda Yılmaz1SAKARYA UNIVERSITYSAKARYA UNIVERSITYThe development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a price estimation model. The data needed for analysis was collected using Selenium and BeautifulSoup and prepared for analysis by applying various data preprocessing steps. Lasso regression and PCA analysis were used for feature selection and size reduction, and the GridSearchCV method was used for hyperparameter tuning. The results were evaluated with machine learning algorithms. Random Forest, K-Nearest Neighbor, Gradient Boost, AdaBoost, Support Vector and XGBoost regression algorithms were used in the analysis. The obtained analysis results were evaluated together with Mean Square Error (MSE), Root Mean Square Error (RMSE) and Coefficient of Determination (R-square). When the results for data set 1 were examined, the model that gave the best results was XGBoost Regression with 0.973 R2, 0.026 MSE and 0.161 RMSE values. When the results for data set 2 were examined, the model that gave the best results was K-Nearest Neighbor Regression with 0.978 R2, 0.021 MSE and 0.145 RMSE values.https://dergipark.org.tr/tr/download/article-file/3185615web scrapingmachine learningprice prediction
spellingShingle İhsan Hakan Selvi
Seda Yılmaz
Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
Sakarya University Journal of Computer and Information Sciences
web scraping
machine learning
price prediction
title Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
title_full Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
title_fullStr Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
title_full_unstemmed Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
title_short Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
title_sort price prediction using web scraping and machine learning algorithms in the used car market
topic web scraping
machine learning
price prediction
url https://dergipark.org.tr/tr/download/article-file/3185615
work_keys_str_mv AT ihsanhakanselvi pricepredictionusingwebscrapingandmachinelearningalgorithmsintheusedcarmarket
AT sedayılmaz pricepredictionusingwebscrapingandmachinelearningalgorithmsintheusedcarmarket