Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet,...

Full description

Bibliographic Details
Main Authors:	Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros
Format:	Article
Language:	English
Published:	Society for Sociological Science 2023-03-01
Series:	Sociological Science
Subjects:	categories concepts deep learning typicality bert transformer models
Online Access:	https://sociologicalscience.com/articles-v10-3-82/

_version_	1797893884929376256
author	Gaël Le Mens Balázs Kovács Michael T. Hannan Guillem Pros
author_facet	Gaël Le Mens Balázs Kovács Michael T. Hannan Guillem Pros
author_sort	Gaël Le Mens
collection	DOAJ
description	Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.
first_indexed	2024-04-10T07:00:01Z
format	Article
id	doaj.art-f682530256a64d20b4bc46f65476b2da
institution	Directory Open Access Journal
issn	2330-6696
language	English
last_indexed	2024-04-10T07:00:01Z
publishDate	2023-03-01
publisher	Society for Sociological Science
record_format	Article
series	Sociological Science
spelling	doaj.art-f682530256a64d20b4bc46f65476b2da2023-02-28T03:09:16ZengSociety for Sociological ScienceSociological Science2330-66962023-03-011038211710.15195/v10.a3Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?Gaël Le Mens0Balázs Kovács1Michael T. Hannan2Guillem Pros3Universitat Pompeu Fabra (UPF)Yale UniversityStanford UniversityUniversitat Pompeu FabraSocial scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.https://sociologicalscience.com/articles-v10-3-82/categoriesconceptsdeep learningtypicalityberttransformer models
spellingShingle	Gaël Le Mens Balázs Kovács Michael T. Hannan Guillem Pros Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality? Sociological Science categories concepts deep learning typicality bert transformer models
title	Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_full	Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_fullStr	Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_full_unstemmed	Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_short	Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_sort	using machine learning to uncover the semantics of concepts how well do typicality measures extracted from a bert text classifier match human judgments of genre typicality
topic	categories concepts deep learning typicality bert transformer models
url	https://sociologicalscience.com/articles-v10-3-82/
work_keys_str_mv	AT gaellemens usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality AT balazskovacs usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality AT michaelthannan usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality AT guillempros usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality

Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

Similar Items