Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China

Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peopl...

Full description

Bibliographic Details
Main Authors: Shichang Ding, Xin Gao, Yufan Dong, Yiwei Tong, Xiaoming Fu
Format: Article
Language:English
Published: Tsinghua University Press 2021-03-01
Series:Journal of Social Computing
Subjects:
Online Access:https://www.sciopen.com/article/10.23919/JSC.2021.0003
_version_ 1798028690102157312
author Shichang Ding
Xin Gao
Yufan Dong
Yiwei Tong
Xiaoming Fu
author_facet Shichang Ding
Xin Gao
Yufan Dong
Yiwei Tong
Xiaoming Fu
author_sort Shichang Ding
collection DOAJ
description Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peoples’ cyberspace behaviors and relationships, such as the content of tweets or the social networks between online users. Besides cyberspace data, alternative data sources about users’ physical behavior, like their home location, may offer new insights. More specifically, in this paper, we study how to predict a person’s income level, family income level, occupation type, and education level from his/her home location. As a case study, we collect people’s home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities in China. We further enrich home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose H2SEA, a factorization machine-based multi-task learning method with attention mechanism. Extensive experiment results show that: (1) Home location can clearly improve the estimation accuracy for all SEA prediction tasks (e.g., 80.2% improvement in terms of F1-score in estimating personal income level); (2) The proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as Area Under Curve (AUC), F-measure, and specificity; (3) The performance of specific SEA prediction tasks (e.g., personal income) can be further improved if H2SEA only focuses on cities or villages due to urban-rural gap in China; (4) Compared with online crawled housing price data, the area-level average income and Points Of Interest (POI) are more important features for SEA inferences in China.
first_indexed 2024-04-11T19:12:03Z
format Article
id doaj.art-ab2675baaef945648399961dd5942a7c
institution Directory Open Access Journal
issn 2688-5255
language English
last_indexed 2024-04-11T19:12:03Z
publishDate 2021-03-01
publisher Tsinghua University Press
record_format Article
series Journal of Social Computing
spelling doaj.art-ab2675baaef945648399961dd5942a7c2022-12-22T04:07:34ZengTsinghua University PressJournal of Social Computing2688-52552021-03-0121718810.23919/JSC.2021.0003Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in ChinaShichang Ding0Xin Gao1Yufan Dong2Yiwei Tong3Xiaoming Fu4<institution>State Key Laboratory of Mathematical Engineering and Advanced Computing</institution>, <city>Zhengzhou</city> <postal-code>276800</postal-code>, <country>China</country><institution content-type="dept">Department of Sociology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100085</postal-code>, <country>China</country><institution>Institute of Computer Science, University of Göttingen</institution>, <city>Göttingen</city> <postal-code>37077</postal-code>, <country>Germany</country><institution>Shanghai Hejin Information Technology Company</institution>, <city>Shanghai</city> <postal-code>200100</postal-code>, <country>China</country><institution>Institute of Computer Science, University of Göttingen</institution>, <city>Göttingen</city> <postal-code>37077</postal-code>, <country>Germany</country>Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peoples’ cyberspace behaviors and relationships, such as the content of tweets or the social networks between online users. Besides cyberspace data, alternative data sources about users’ physical behavior, like their home location, may offer new insights. More specifically, in this paper, we study how to predict a person’s income level, family income level, occupation type, and education level from his/her home location. As a case study, we collect people’s home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities in China. We further enrich home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose H2SEA, a factorization machine-based multi-task learning method with attention mechanism. Extensive experiment results show that: (1) Home location can clearly improve the estimation accuracy for all SEA prediction tasks (e.g., 80.2% improvement in terms of F1-score in estimating personal income level); (2) The proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as Area Under Curve (AUC), F-measure, and specificity; (3) The performance of specific SEA prediction tasks (e.g., personal income) can be further improved if H2SEA only focuses on cities or villages due to urban-rural gap in China; (4) Compared with online crawled housing price data, the area-level average income and Points Of Interest (POI) are more important features for SEA inferences in China.https://www.sciopen.com/article/10.23919/JSC.2021.0003personal incomefamily incomeoccupationeducationmulti-task learning
spellingShingle Shichang Ding
Xin Gao
Yufan Dong
Yiwei Tong
Xiaoming Fu
Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China
Journal of Social Computing
personal income
family income
occupation
education
multi-task learning
title Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China
title_full Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China
title_fullStr Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China
title_full_unstemmed Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China
title_short Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China
title_sort estimating multiple socioeconomic attributes via home location a case study in china
topic personal income
family income
occupation
education
multi-task learning
url https://www.sciopen.com/article/10.23919/JSC.2021.0003
work_keys_str_mv AT shichangding estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina
AT xingao estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina
AT yufandong estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina
AT yiweitong estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina
AT xiaomingfu estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina