Leveraging Web Scraping to Gather Tourism Information Data

The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their e...

Full description

Bibliographic Details
Main Authors: Kamarazaman, Nadzirah, Mohamad Ali, Nazlena, Arshad, Haslina
Format: Article
Language:English
Published: UUM PRESS 2024
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/32088/1/JETH%2004%202024%2016-29.pdf
_version_ 1825806464301662208
author Kamarazaman, Nadzirah
Mohamad Ali, Nazlena
Arshad, Haslina
author_facet Kamarazaman, Nadzirah
Mohamad Ali, Nazlena
Arshad, Haslina
author_sort Kamarazaman, Nadzirah
collection UUM
description The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their experiences through tourism websites. Analysing this data is key to improving future tourists' experiences. Therefore, the objective of this study is to employ web scraping to gather data on places of interest (POI) and user attributes, specifically in the state of Melaka via the TripAdvisor website. Melaka is chosen as it is one of the places recognised by the United Nations, Educational, Scientific and Cultural Organization (UNESCO). The study focuses on the 200 POI locations (UNESCO) Map, encompassing both Melaka's core and buffer zones. These POIs are categorised into four heritage types: built heritage, natural heritage, personal heritage, and living heritage, with some belonging to more than one category. For the data collection process, this study utilised the TripAdvisor website and extracted a total of 14 attributes. Specifically, 27282 user data entries were collected from 163 POIs in the core zone area, and 8305 data entries from 37 POIs in the buffer zone area. The data is managed and stored in various formats, including CSV, JSON, and Excel files in the repository. The data helps in the development of a tourism application. Furthermore, the tourism industry can benefit from this study by enhancing their services and conserving the cultural heritage
first_indexed 2025-03-06T01:32:11Z
format Article
id uum-32088
institution Universiti Utara Malaysia
language English
last_indexed 2025-03-06T01:32:11Z
publishDate 2024
publisher UUM PRESS
record_format eprints
spelling uum-320882025-02-20T11:57:21Z https://repo.uum.edu.my/id/eprint/32088/ Leveraging Web Scraping to Gather Tourism Information Data Kamarazaman, Nadzirah Mohamad Ali, Nazlena Arshad, Haslina HV Social pathology. Social and public welfare The influence of Information and Communication Technologies (ICT) on both individuals' daily lives and the economy is of significant importance. In this context, the tourism industry plays a crucial role, and it is essential to recognise the contributions of tourists in terms of sharing their experiences through tourism websites. Analysing this data is key to improving future tourists' experiences. Therefore, the objective of this study is to employ web scraping to gather data on places of interest (POI) and user attributes, specifically in the state of Melaka via the TripAdvisor website. Melaka is chosen as it is one of the places recognised by the United Nations, Educational, Scientific and Cultural Organization (UNESCO). The study focuses on the 200 POI locations (UNESCO) Map, encompassing both Melaka's core and buffer zones. These POIs are categorised into four heritage types: built heritage, natural heritage, personal heritage, and living heritage, with some belonging to more than one category. For the data collection process, this study utilised the TripAdvisor website and extracted a total of 14 attributes. Specifically, 27282 user data entries were collected from 163 POIs in the core zone area, and 8305 data entries from 37 POIs in the buffer zone area. The data is managed and stored in various formats, including CSV, JSON, and Excel files in the repository. The data helps in the development of a tourism application. Furthermore, the tourism industry can benefit from this study by enhancing their services and conserving the cultural heritage UUM PRESS 2024-07 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/32088/1/JETH%2004%202024%2016-29.pdf Kamarazaman, Nadzirah and Mohamad Ali, Nazlena and Arshad, Haslina (2024) Leveraging Web Scraping to Gather Tourism Information Data. Journal of Event, Tourism and Hospitality Studies (JETH), 4. pp. 16-29. ISSN eISSN 2805-4423 https://e-journal.uum.edu.my/index.php/jeth/
spellingShingle HV Social pathology. Social and public welfare
Kamarazaman, Nadzirah
Mohamad Ali, Nazlena
Arshad, Haslina
Leveraging Web Scraping to Gather Tourism Information Data
title Leveraging Web Scraping to Gather Tourism Information Data
title_full Leveraging Web Scraping to Gather Tourism Information Data
title_fullStr Leveraging Web Scraping to Gather Tourism Information Data
title_full_unstemmed Leveraging Web Scraping to Gather Tourism Information Data
title_short Leveraging Web Scraping to Gather Tourism Information Data
title_sort leveraging web scraping to gather tourism information data
topic HV Social pathology. Social and public welfare
url https://repo.uum.edu.my/id/eprint/32088/1/JETH%2004%202024%2016-29.pdf
work_keys_str_mv AT kamarazamannadzirah leveragingwebscrapingtogathertourisminformationdata
AT mohamadalinazlena leveragingwebscrapingtogathertourisminformationdata
AT arshadhaslina leveragingwebscrapingtogathertourisminformationdata