Inferring sensitive user information from tap-on tap-off public transport data

EZ-Link card is a contactless stored-value card used for public transport fare payment in Singapore. This project aims to infer sensitive information about users from mobility data and identify the privacy risks involved. Examples of sensitive information inferred are duration and probable area of t...

Full description

Bibliographic Details
Main Author: Cheng, Kelly Wen Xin
Other Authors: Cai Wentong
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137829
_version_ 1826114643020480512
author Cheng, Kelly Wen Xin
author2 Cai Wentong
author_facet Cai Wentong
Cheng, Kelly Wen Xin
author_sort Cheng, Kelly Wen Xin
collection NTU
description EZ-Link card is a contactless stored-value card used for public transport fare payment in Singapore. This project aims to infer sensitive information about users from mobility data and identify the privacy risks involved. Examples of sensitive information inferred are duration and probable area of the home of residence, work, and activity, as well as the social relationship between pairs of users. As a proof of concept, a stratified sample of 50,000 Card ID was sampled based on passenger type. Only 4 journey records exact to seconds is required for a user to be unique. Hence, targeted de-anonymisation could be performed easily with 4 or less known data points. A rule-based approach was implemented to estimate the home of residence, activity location and purpose of the activity. 28.47% of the users’ home of residence could be estimated from the rule-based approach. The social relationship between pairs of users is calculated using cosine similarity which serves as an indicator for the closeness of social relationship. Use-cases are plotted for pairs of users with various degree of closeness and they were discovered to have the same estimated home of residence. This implies that a family or household tends to commute together. The privacy risks involved de-anonymisation of mobility data using auxiliary information. De-anonymisation leads to exposure of sensitive information for the users. It allows for the identification of vulnerable groups such as Child/Student travelling unaccompanied too. It is possible to further correlate the de-anonymised mobility data against other leaked databases for malicious intent.
first_indexed 2024-10-01T03:42:28Z
format Final Year Project (FYP)
id ntu-10356/137829
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:42:28Z
publishDate 2020
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1378292020-04-15T11:56:14Z Inferring sensitive user information from tap-on tap-off public transport data Cheng, Kelly Wen Xin Cai Wentong School of Computer Science and Engineering TUMCREATE aswtcai@ntu.edu.sg Engineering::Computer science and engineering EZ-Link card is a contactless stored-value card used for public transport fare payment in Singapore. This project aims to infer sensitive information about users from mobility data and identify the privacy risks involved. Examples of sensitive information inferred are duration and probable area of the home of residence, work, and activity, as well as the social relationship between pairs of users. As a proof of concept, a stratified sample of 50,000 Card ID was sampled based on passenger type. Only 4 journey records exact to seconds is required for a user to be unique. Hence, targeted de-anonymisation could be performed easily with 4 or less known data points. A rule-based approach was implemented to estimate the home of residence, activity location and purpose of the activity. 28.47% of the users’ home of residence could be estimated from the rule-based approach. The social relationship between pairs of users is calculated using cosine similarity which serves as an indicator for the closeness of social relationship. Use-cases are plotted for pairs of users with various degree of closeness and they were discovered to have the same estimated home of residence. This implies that a family or household tends to commute together. The privacy risks involved de-anonymisation of mobility data using auxiliary information. De-anonymisation leads to exposure of sensitive information for the users. It allows for the identification of vulnerable groups such as Child/Student travelling unaccompanied too. It is possible to further correlate the de-anonymised mobility data against other leaked databases for malicious intent. Bachelor of Engineering (Computer Science) 2020-04-15T11:56:13Z 2020-04-15T11:56:13Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137829 en SCSE19-0444 application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering
Cheng, Kelly Wen Xin
Inferring sensitive user information from tap-on tap-off public transport data
title Inferring sensitive user information from tap-on tap-off public transport data
title_full Inferring sensitive user information from tap-on tap-off public transport data
title_fullStr Inferring sensitive user information from tap-on tap-off public transport data
title_full_unstemmed Inferring sensitive user information from tap-on tap-off public transport data
title_short Inferring sensitive user information from tap-on tap-off public transport data
title_sort inferring sensitive user information from tap on tap off public transport data
topic Engineering::Computer science and engineering
url https://hdl.handle.net/10356/137829
work_keys_str_mv AT chengkellywenxin inferringsensitiveuserinformationfromtapontapoffpublictransportdata