Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet,...

Full description

Bibliographic Details
Main Authors: Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros
Format: Article
Language:English
Published: Society for Sociological Science 2023-03-01
Series:Sociological Science
Subjects:
Online Access:https://sociologicalscience.com/articles-v10-3-82/
_version_ 1797893884929376256
author Gaël Le Mens
Balázs Kovács
Michael T. Hannan
Guillem Pros
author_facet Gaël Le Mens
Balázs Kovács
Michael T. Hannan
Guillem Pros
author_sort Gaël Le Mens
collection DOAJ
description Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.
first_indexed 2024-04-10T07:00:01Z
format Article
id doaj.art-f682530256a64d20b4bc46f65476b2da
institution Directory Open Access Journal
issn 2330-6696
language English
last_indexed 2024-04-10T07:00:01Z
publishDate 2023-03-01
publisher Society for Sociological Science
record_format Article
series Sociological Science
spelling doaj.art-f682530256a64d20b4bc46f65476b2da2023-02-28T03:09:16ZengSociety for Sociological ScienceSociological Science2330-66962023-03-011038211710.15195/v10.a3Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?Gaël Le Mens0Balázs Kovács1Michael T. Hannan2Guillem Pros3Universitat Pompeu Fabra (UPF)Yale UniversityStanford UniversityUniversitat Pompeu FabraSocial scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.https://sociologicalscience.com/articles-v10-3-82/categoriesconceptsdeep learningtypicalityberttransformer models
spellingShingle Gaël Le Mens
Balázs Kovács
Michael T. Hannan
Guillem Pros
Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
Sociological Science
categories
concepts
deep learning
typicality
bert
transformer models
title Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_full Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_fullStr Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_full_unstemmed Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_short Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?
title_sort using machine learning to uncover the semantics of concepts how well do typicality measures extracted from a bert text classifier match human judgments of genre typicality
topic categories
concepts
deep learning
typicality
bert
transformer models
url https://sociologicalscience.com/articles-v10-3-82/
work_keys_str_mv AT gaellemens usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality
AT balazskovacs usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality
AT michaelthannan usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality
AT guillempros usingmachinelearningtouncoverthesemanticsofconceptshowwelldotypicalitymeasuresextractedfromaberttextclassifiermatchhumanjudgmentsofgenretypicality