Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China
Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peopl...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2021-03-01
|
Series: | Journal of Social Computing |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.23919/JSC.2021.0003 |
_version_ | 1798028690102157312 |
---|---|
author | Shichang Ding Xin Gao Yufan Dong Yiwei Tong Xiaoming Fu |
author_facet | Shichang Ding Xin Gao Yufan Dong Yiwei Tong Xiaoming Fu |
author_sort | Shichang Ding |
collection | DOAJ |
description | Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peoples’ cyberspace behaviors and relationships, such as the content of tweets or the social networks between online users. Besides cyberspace data, alternative data sources about users’ physical behavior, like their home location, may offer new insights. More specifically, in this paper, we study how to predict a person’s income level, family income level, occupation type, and education level from his/her home location. As a case study, we collect people’s home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities in China. We further enrich home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose H2SEA, a factorization machine-based multi-task learning method with attention mechanism. Extensive experiment results show that: (1) Home location can clearly improve the estimation accuracy for all SEA prediction tasks (e.g., 80.2% improvement in terms of F1-score in estimating personal income level); (2) The proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as Area Under Curve (AUC), F-measure, and specificity; (3) The performance of specific SEA prediction tasks (e.g., personal income) can be further improved if H2SEA only focuses on cities or villages due to urban-rural gap in China; (4) Compared with online crawled housing price data, the area-level average income and Points Of Interest (POI) are more important features for SEA inferences in China. |
first_indexed | 2024-04-11T19:12:03Z |
format | Article |
id | doaj.art-ab2675baaef945648399961dd5942a7c |
institution | Directory Open Access Journal |
issn | 2688-5255 |
language | English |
last_indexed | 2024-04-11T19:12:03Z |
publishDate | 2021-03-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Journal of Social Computing |
spelling | doaj.art-ab2675baaef945648399961dd5942a7c2022-12-22T04:07:34ZengTsinghua University PressJournal of Social Computing2688-52552021-03-0121718810.23919/JSC.2021.0003Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in ChinaShichang Ding0Xin Gao1Yufan Dong2Yiwei Tong3Xiaoming Fu4<institution>State Key Laboratory of Mathematical Engineering and Advanced Computing</institution>, <city>Zhengzhou</city> <postal-code>276800</postal-code>, <country>China</country><institution content-type="dept">Department of Sociology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100085</postal-code>, <country>China</country><institution>Institute of Computer Science, University of Göttingen</institution>, <city>Göttingen</city> <postal-code>37077</postal-code>, <country>Germany</country><institution>Shanghai Hejin Information Technology Company</institution>, <city>Shanghai</city> <postal-code>200100</postal-code>, <country>China</country><institution>Institute of Computer Science, University of Göttingen</institution>, <city>Göttingen</city> <postal-code>37077</postal-code>, <country>Germany</country>Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peoples’ cyberspace behaviors and relationships, such as the content of tweets or the social networks between online users. Besides cyberspace data, alternative data sources about users’ physical behavior, like their home location, may offer new insights. More specifically, in this paper, we study how to predict a person’s income level, family income level, occupation type, and education level from his/her home location. As a case study, we collect people’s home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities in China. We further enrich home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose H2SEA, a factorization machine-based multi-task learning method with attention mechanism. Extensive experiment results show that: (1) Home location can clearly improve the estimation accuracy for all SEA prediction tasks (e.g., 80.2% improvement in terms of F1-score in estimating personal income level); (2) The proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as Area Under Curve (AUC), F-measure, and specificity; (3) The performance of specific SEA prediction tasks (e.g., personal income) can be further improved if H2SEA only focuses on cities or villages due to urban-rural gap in China; (4) Compared with online crawled housing price data, the area-level average income and Points Of Interest (POI) are more important features for SEA inferences in China.https://www.sciopen.com/article/10.23919/JSC.2021.0003personal incomefamily incomeoccupationeducationmulti-task learning |
spellingShingle | Shichang Ding Xin Gao Yufan Dong Yiwei Tong Xiaoming Fu Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China Journal of Social Computing personal income family income occupation education multi-task learning |
title | Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China |
title_full | Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China |
title_fullStr | Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China |
title_full_unstemmed | Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China |
title_short | Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China |
title_sort | estimating multiple socioeconomic attributes via home location a case study in china |
topic | personal income family income occupation education multi-task learning |
url | https://www.sciopen.com/article/10.23919/JSC.2021.0003 |
work_keys_str_mv | AT shichangding estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina AT xingao estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina AT yufandong estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina AT yiweitong estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina AT xiaomingfu estimatingmultiplesocioeconomicattributesviahomelocationacasestudyinchina |