Analysis of Categorical Data with the R Package <i>confreq</i>

The person-centered approach in categorical data analysis is introduced as a complementary approach to the variable-centered approach. The former uses persons, animals, or objects on the basis of their combination of characteristics which can be displayed in multiway contingency tables. Configural F...

Full description

Bibliographic Details
Main Authors: Jörg-Henrik Heine, Mark Stemmler
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Psych
Subjects:
Online Access:https://www.mdpi.com/2624-8611/3/3/34
Description
Summary:The person-centered approach in categorical data analysis is introduced as a complementary approach to the variable-centered approach. The former uses persons, animals, or objects on the basis of their combination of characteristics which can be displayed in multiway contingency tables. Configural Frequency Analysis (CFA) and log-linear modeling (LLM) are the two most prominent (and related) statistical methods. Both compare observed frequencies (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>f</mi><msub><mi>o</mi><mrow><mi>i</mi><mo>…</mo><mi>k</mi></mrow></msub></msub></semantics></math></inline-formula>) with expected frequencies (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>f</mi><msub><mi>e</mi><mrow><mi>i</mi><mo>…</mo><mi>k</mi></mrow></msub></msub></semantics></math></inline-formula>). While LLM uses primarily a model-fitting approach, CFA analyzes residuals of non-fitting models. Residuals with significantly more observed than expected frequencies (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>f</mi><msub><mi>o</mi><mrow><mi>i</mi><mo>…</mo><mi>k</mi></mrow></msub></msub><mo>></mo><msub><mi>f</mi><msub><mi>e</mi><mrow><mi>i</mi><mo>…</mo><mi>k</mi></mrow></msub></msub></mrow></semantics></math></inline-formula>) are called <i>types</i>, while residuals with significantly less observed than expected frequencies (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>f</mi><msub><mi>o</mi><mrow><mi>i</mi><mo>…</mo><mi>k</mi></mrow></msub></msub><mo><</mo><msub><mi>f</mi><msub><mi>e</mi><mrow><mi>i</mi><mo>…</mo><mi>k</mi></mrow></msub></msub></mrow></semantics></math></inline-formula>) are called <i>antitypes</i>. The R package <i>confreq</i> is presented and its use is demonstrated with several data examples. Results of contingency table analyses can be displayed in tables but also in graphics representing the size and type of residual. The expected frequencies represent the null hypothesis and different null hypotheses result in different expected frequencies. Different kinds of CFAs are presented: the first-order CFA based on the null hypothesis of independence, CFA with covariates, and the two-sample CFA. The calculation of the expected frequencies can be controlled through the design matrix which can be easily handled in <i>confreq</i>.
ISSN:2624-8611