A Semi-Automatic Data–Scraping Method for the Public Transport Domain

The growing amount of data on the Internet has led to a situation in which it is essential to process these data to generate new services with the specific aim of improving people's daily living conditions. Transport data is of the utmost importance, since everyday people have to move around to...

Full description

Bibliographic Details
Main Authors: Belen Vela, Jose Maria Cavero, Paloma Caceres, Carlos E. Cuesta
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8782469/
Description
Summary:The growing amount of data on the Internet has led to a situation in which it is essential to process these data to generate new services with the specific aim of improving people's daily living conditions. Transport data is of the utmost importance, since everyday people have to move around to perform some daily tasks, such as going to work, studying and shopping, and this means that the number of journeys by public transport grows daily. People with special needs make a large number of these trips, but they do not have sufficient information about the accessibility of the routes they want to take. Although there are numerous websites and applications that provide information on public transport services, most do not provide detailed information on the accessibility of the routes. We are, therefore, developing a technological framework for the processing, management, and exploitation of open data to promote accessibility to urban public transport. This is taking place within the framework of the Access@City project. This paper specifically focuses on the data extraction and processing of the existing information on the web concerning public transport and its accessibility for the generation of an open data repository in which to store this information. We, therefore, propose a method for the semi-automatic generation of a data scraper for the public transport domain. This method allows the extraction of public transport data and the existing accessibility information from a selected website. We have additionally developed a web tool that employs the aforementioned method to generate a data scraper for the public transport domain.
ISSN:2169-3536