Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
The development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a pri...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sakarya University
2023-08-01
|
Series: | Sakarya University Journal of Computer and Information Sciences |
Subjects: | |
Online Access: | https://dergipark.org.tr/tr/download/article-file/3185615 |
_version_ | 1797351839235047424 |
---|---|
author | İhsan Hakan Selvi Seda Yılmaz |
author_facet | İhsan Hakan Selvi Seda Yılmaz |
author_sort | İhsan Hakan Selvi |
collection | DOAJ |
description | The development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a price estimation model. The data needed for analysis was collected using Selenium and BeautifulSoup and prepared for analysis by applying various data preprocessing steps. Lasso regression and PCA analysis were used for feature selection and size reduction, and the GridSearchCV method was used for hyperparameter tuning. The results were evaluated with machine learning algorithms.
Random Forest, K-Nearest Neighbor, Gradient Boost, AdaBoost, Support Vector and XGBoost regression algorithms were used in the analysis. The obtained analysis results were evaluated together with Mean Square Error (MSE), Root Mean Square Error (RMSE) and Coefficient of Determination (R-square). When the results for data set 1 were examined, the model that gave the best results was XGBoost Regression with 0.973 R2, 0.026 MSE and 0.161 RMSE values. When the results for data set 2 were examined, the model that gave the best results was K-Nearest Neighbor Regression with 0.978 R2, 0.021 MSE and 0.145 RMSE values. |
first_indexed | 2024-03-08T13:06:19Z |
format | Article |
id | doaj.art-90aad749a0ec4259ad9d8efafd554619 |
institution | Directory Open Access Journal |
issn | 2636-8129 |
language | English |
last_indexed | 2024-03-08T13:06:19Z |
publishDate | 2023-08-01 |
publisher | Sakarya University |
record_format | Article |
series | Sakarya University Journal of Computer and Information Sciences |
spelling | doaj.art-90aad749a0ec4259ad9d8efafd5546192024-01-18T16:44:35ZengSakarya UniversitySakarya University Journal of Computer and Information Sciences2636-81292023-08-016214014810.35377/saucis...130910328Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Marketİhsan Hakan Selvi0Seda Yılmaz1SAKARYA UNIVERSITYSAKARYA UNIVERSITYThe development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a price estimation model. The data needed for analysis was collected using Selenium and BeautifulSoup and prepared for analysis by applying various data preprocessing steps. Lasso regression and PCA analysis were used for feature selection and size reduction, and the GridSearchCV method was used for hyperparameter tuning. The results were evaluated with machine learning algorithms. Random Forest, K-Nearest Neighbor, Gradient Boost, AdaBoost, Support Vector and XGBoost regression algorithms were used in the analysis. The obtained analysis results were evaluated together with Mean Square Error (MSE), Root Mean Square Error (RMSE) and Coefficient of Determination (R-square). When the results for data set 1 were examined, the model that gave the best results was XGBoost Regression with 0.973 R2, 0.026 MSE and 0.161 RMSE values. When the results for data set 2 were examined, the model that gave the best results was K-Nearest Neighbor Regression with 0.978 R2, 0.021 MSE and 0.145 RMSE values.https://dergipark.org.tr/tr/download/article-file/3185615web scrapingmachine learningprice prediction |
spellingShingle | İhsan Hakan Selvi Seda Yılmaz Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market Sakarya University Journal of Computer and Information Sciences web scraping machine learning price prediction |
title | Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market |
title_full | Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market |
title_fullStr | Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market |
title_full_unstemmed | Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market |
title_short | Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market |
title_sort | price prediction using web scraping and machine learning algorithms in the used car market |
topic | web scraping machine learning price prediction |
url | https://dergipark.org.tr/tr/download/article-file/3185615 |
work_keys_str_mv | AT ihsanhakanselvi pricepredictionusingwebscrapingandmachinelearningalgorithmsintheusedcarmarket AT sedayılmaz pricepredictionusingwebscrapingandmachinelearningalgorithmsintheusedcarmarket |