Leveraging Web Scraping to Gather Tourism Information Data
The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their e...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
UUM PRESS
2024
|
Subjects: | |
Online Access: | https://repo.uum.edu.my/id/eprint/32088/1/JETH%2004%202024%2016-29.pdf |
_version_ | 1825806464301662208 |
---|---|
author | Kamarazaman, Nadzirah Mohamad Ali, Nazlena Arshad, Haslina |
author_facet | Kamarazaman, Nadzirah Mohamad Ali, Nazlena Arshad, Haslina |
author_sort | Kamarazaman, Nadzirah |
collection | UUM |
description | The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their experiences through tourism websites. Analysing this data is key to improving future tourists' experiences. Therefore, the objective of this study is to employ web scraping to gather data on places of interest (POI) and user attributes, specifically in the state of Melaka via the TripAdvisor website. Melaka is chosen as it is one of the places recognised by the United Nations, Educational, Scientific and Cultural Organization (UNESCO). The study focuses on the 200 POI locations (UNESCO) Map, encompassing both Melaka's core and buffer zones. These POIs are categorised into four heritage types: built heritage, natural heritage, personal heritage, and living heritage, with some belonging to more than one category. For the data collection process, this study utilised the TripAdvisor website and extracted a total of 14 attributes. Specifically, 27282 user data entries were collected from 163 POIs in the core zone area, and 8305 data entries from 37 POIs in the buffer zone area. The data is managed and stored in various formats, including CSV, JSON, and Excel files in the repository. The data helps in the development of a tourism application. Furthermore, the tourism industry can benefit from this study by enhancing their services and conserving the cultural heritage |
first_indexed | 2025-03-06T01:32:11Z |
format | Article |
id | uum-32088 |
institution | Universiti Utara Malaysia |
language | English |
last_indexed | 2025-03-06T01:32:11Z |
publishDate | 2024 |
publisher | UUM PRESS |
record_format | eprints |
spelling | uum-320882025-02-20T11:57:21Z https://repo.uum.edu.my/id/eprint/32088/ Leveraging Web Scraping to Gather Tourism Information Data Kamarazaman, Nadzirah Mohamad Ali, Nazlena Arshad, Haslina HV Social pathology. Social and public welfare The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their experiences through tourism websites. Analysing this data is key to improving future tourists' experiences. Therefore, the objective of this study is to employ web scraping to gather data on places of interest (POI) and user attributes, specifically in the state of Melaka via the TripAdvisor website. Melaka is chosen as it is one of the places recognised by the United Nations, Educational, Scientific and Cultural Organization (UNESCO). The study focuses on the 200 POI locations (UNESCO) Map, encompassing both Melaka's core and buffer zones. These POIs are categorised into four heritage types: built heritage, natural heritage, personal heritage, and living heritage, with some belonging to more than one category. For the data collection process, this study utilised the TripAdvisor website and extracted a total of 14 attributes. Specifically, 27282 user data entries were collected from 163 POIs in the core zone area, and 8305 data entries from 37 POIs in the buffer zone area. The data is managed and stored in various formats, including CSV, JSON, and Excel files in the repository. The data helps in the development of a tourism application. Furthermore, the tourism industry can benefit from this study by enhancing their services and conserving the cultural heritage UUM PRESS 2024-07 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/32088/1/JETH%2004%202024%2016-29.pdf Kamarazaman, Nadzirah and Mohamad Ali, Nazlena and Arshad, Haslina (2024) Leveraging Web Scraping to Gather Tourism Information Data. Journal of Event, Tourism and Hospitality Studies (JETH), 4. pp. 16-29. ISSN eISSN 2805-4423 https://e-journal.uum.edu.my/index.php/jeth/ |
spellingShingle | HV Social pathology. Social and public welfare Kamarazaman, Nadzirah Mohamad Ali, Nazlena Arshad, Haslina Leveraging Web Scraping to Gather Tourism Information Data |
title | Leveraging Web Scraping to Gather Tourism Information Data |
title_full | Leveraging Web Scraping to Gather Tourism Information Data |
title_fullStr | Leveraging Web Scraping to Gather Tourism Information Data |
title_full_unstemmed | Leveraging Web Scraping to Gather Tourism Information Data |
title_short | Leveraging Web Scraping to Gather Tourism Information Data |
title_sort | leveraging web scraping to gather tourism information data |
topic | HV Social pathology. Social and public welfare |
url | https://repo.uum.edu.my/id/eprint/32088/1/JETH%2004%202024%2016-29.pdf |
work_keys_str_mv | AT kamarazamannadzirah leveragingwebscrapingtogathertourisminformationdata AT mohamadalinazlena leveragingwebscrapingtogathertourisminformationdata AT arshadhaslina leveragingwebscrapingtogathertourisminformationdata |