The benefits, risks and bounds of personalizing the alignment of large language models to individuals
Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...
Main Authors: | Kirk, HR, Vidgen, B, Röttger, P, Hale, SA |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Springer Nature
2024
|
Similar Items
-
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
by: Kirk, HR, et al.
Published: (2022) -
Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
by: Kirk, HR, et al.
Published: (2022) -
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
by: Kirk, H, et al.
Published: (2021) -
Two contrasting data annotation paradigms for subjective NLP tasks
by: Röttger, P, et al.
Published: (2022) -
Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
by: Ahmed, Z, et al.
Published: (2022)