Style over substance: A psychologically informed approach to feature selection and generalisability for author classification

Author profiling, or classifying user generated content based on demographic or other personal attributes, is a key task in social media-based research. Whilst high-accuracy has been achieved on many attributes, most studies tend to train and test models on a single domain only, ignoring cross-domai...

Full description

Bibliographic Details
Main Authors: Isabel Holmes, Timothy Cribbin, Nelli Ferenczi
Format: Article
Language:English
Published: Elsevier 2023-03-01
Series:Computers in Human Behavior Reports
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2451958822001014
_version_ 1811162766298316800
author Isabel Holmes
Timothy Cribbin
Nelli Ferenczi
author_facet Isabel Holmes
Timothy Cribbin
Nelli Ferenczi
author_sort Isabel Holmes
collection DOAJ
description Author profiling, or classifying user generated content based on demographic or other personal attributes, is a key task in social media-based research. Whilst high-accuracy has been achieved on many attributes, most studies tend to train and test models on a single domain only, ignoring cross-domain performance and research shows that models often transfer poorly into new domains as they tend to depend heavily on topic-specific (i.e., lexical) features. Knowledge specific to the field (e.g., Psychology, Political Science) is often ignored, with a reliance on data driven algorithms for feature development and selection.Focusing on political affiliation, we evaluate an approach that selects stylistic features according to known psychological correlates (personality traits) of this attribute. Training data was collected from Reddit posts made by regular users of the political subreddits of r/republican and r/democrat. A second, non-political dataset, was created by collecting posts by the same users but in different subreddits.Our results show that introducing domain specific knowledge in the form of psychologically informed stylistic features resulted in better out of training domain performance than lexical or more commonly used stylistic features.
first_indexed 2024-04-10T06:34:24Z
format Article
id doaj.art-c1ee8e8a3ae44f28a4f9959e53d9d9df
institution Directory Open Access Journal
issn 2451-9588
language English
last_indexed 2024-04-10T06:34:24Z
publishDate 2023-03-01
publisher Elsevier
record_format Article
series Computers in Human Behavior Reports
spelling doaj.art-c1ee8e8a3ae44f28a4f9959e53d9d9df2023-03-01T04:32:40ZengElsevierComputers in Human Behavior Reports2451-95882023-03-019100267Style over substance: A psychologically informed approach to feature selection and generalisability for author classificationIsabel Holmes0Timothy Cribbin1Nelli Ferenczi2Department of Computer Science, Brunel University London, UK; Corresponding author.Department of Computer Science, Brunel University London, UKDepartment of Psychology, Brunel University London, UKAuthor profiling, or classifying user generated content based on demographic or other personal attributes, is a key task in social media-based research. Whilst high-accuracy has been achieved on many attributes, most studies tend to train and test models on a single domain only, ignoring cross-domain performance and research shows that models often transfer poorly into new domains as they tend to depend heavily on topic-specific (i.e., lexical) features. Knowledge specific to the field (e.g., Psychology, Political Science) is often ignored, with a reliance on data driven algorithms for feature development and selection.Focusing on political affiliation, we evaluate an approach that selects stylistic features according to known psychological correlates (personality traits) of this attribute. Training data was collected from Reddit posts made by regular users of the political subreddits of r/republican and r/democrat. A second, non-political dataset, was created by collecting posts by the same users but in different subreddits.Our results show that introducing domain specific knowledge in the form of psychologically informed stylistic features resulted in better out of training domain performance than lexical or more commonly used stylistic features.http://www.sciencedirect.com/science/article/pii/S2451958822001014Author profilingPolitical affiliation classificationStylistic feature setsModel generalisabilityPolitical psychologyFeature development
spellingShingle Isabel Holmes
Timothy Cribbin
Nelli Ferenczi
Style over substance: A psychologically informed approach to feature selection and generalisability for author classification
Computers in Human Behavior Reports
Author profiling
Political affiliation classification
Stylistic feature sets
Model generalisability
Political psychology
Feature development
title Style over substance: A psychologically informed approach to feature selection and generalisability for author classification
title_full Style over substance: A psychologically informed approach to feature selection and generalisability for author classification
title_fullStr Style over substance: A psychologically informed approach to feature selection and generalisability for author classification
title_full_unstemmed Style over substance: A psychologically informed approach to feature selection and generalisability for author classification
title_short Style over substance: A psychologically informed approach to feature selection and generalisability for author classification
title_sort style over substance a psychologically informed approach to feature selection and generalisability for author classification
topic Author profiling
Political affiliation classification
Stylistic feature sets
Model generalisability
Political psychology
Feature development
url http://www.sciencedirect.com/science/article/pii/S2451958822001014
work_keys_str_mv AT isabelholmes styleoversubstanceapsychologicallyinformedapproachtofeatureselectionandgeneralisabilityforauthorclassification
AT timothycribbin styleoversubstanceapsychologicallyinformedapproachtofeatureselectionandgeneralisabilityforauthorclassification
AT nelliferenczi styleoversubstanceapsychologicallyinformedapproachtofeatureselectionandgeneralisabilityforauthorclassification