The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

Podrobná bibliografie
Hlavní autoři:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
Médium:	Journal article
Jazyk:	English
Vydáno:	Springer Nature 2024

Podobné jednotky

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
Autor: Kirk, HR, a další
Vydáno: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
Autor: Kirk, HR, a další
Vydáno: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
Autor: Kirk, H, a další
Vydáno: (2021)

Exploring large language models for ontology alignment
Autor: He, Y, a další
Vydáno: (2023)

Survey on large language models alignment research
Autor: LIU Kunlin, a další
Vydáno: (2024-06-01)

Survey on large language models alignment research
Autor: LIU Kunlin, a další
Vydáno: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
Autor: Röttger, P, a další
Vydáno: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
Autor: Ahmed, Z, a další
Vydáno: (2022)

Auditing large language models: a three-layered approach
Autor: Mökander, J, a další
Vydáno: (2023)

Strong and weak alignment of large language models with human values
Autor: Mehdi Khamassi, a další
Vydáno: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
Autor: Liu, F, a další
Vydáno: (2025)

HateCheck: functional tests for hate speech detection models
Autor: Röttger, P, a další
Vydáno: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
Autor: Botelho, A, a další
Vydáno: (2021)

Lower bounds on multiple sequence alignment using exact 3-way alignment
Autor: Colbourn Charles J, a další
Vydáno: (2007-04-01)

Improvements in viral gene annotation using large language models and soft alignments
Autor: William L. Harrigan, a další
Vydáno: (2024-04-01)

The Health Star Rating system – is its reductionist (nutrient) approach a benefit or risk for tackling dietary risk factors?
Autor: Mark A Lawrence, a další
Vydáno: (2019-03-01)

Personality prediction based on large language models
Autor: Wee, Jewel Xin Yu
Vydáno: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
Autor: Awatif Baharuddin, 1993-, a další
Vydáno: (2016)

Evaluating the ability of large language models to emulate personality
Autor: Yilei Wang, a další
Vydáno: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
Autor: Bertie Vidgen, a další
Vydáno: (2020-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
Autor: van Staa, T, a další
Vydáno: (2008)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
Autor: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
Vydáno: (2024-09-01)

Reflection in Speech of the Individual-Typological Features of Language Personality
Autor: Наталія Фоміна
Vydáno: (2019-11-01)

Benefit-Risk Analysis of Buprenorphine for Pain Management
Autor: Hale M, a další
Vydáno: (2021-05-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
Autor: Bart S Ferket, a další
Vydáno: (2012-01-01)

Person re-identification based on large vision-language model
Autor: Ding, Songyu
Vydáno: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
Autor: Jae-hee So, a další
Vydáno: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
Autor: Jinhang Wei, a další
Vydáno: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
Autor: Pescetelli, N, a další
Vydáno: (2022)

Hedgerow benefits align with food production and sustainability goals
Autor: R Long, a další
Vydáno: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
Autor: Rachael F. Long, a další
Vydáno: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
Autor: Andrzej Zielezinski, a další
Vydáno: (2017-10-01)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
Autor: Zo Ahmed, a další
Vydáno: (2022-02-01)

Cross-language Wikipedia editing of Okinawa, Japan
Autor: Hale, SA
Vydáno: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
Autor: Estacio Pereira, a další
Vydáno: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
Autor: Grimson W. Eric L., a další
Vydáno: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
Autor: Hauser, R, a další
Vydáno: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
Autor: Awatif Baharuddin, 1993-, a další
Vydáno: (2016)

Outward Bound as education for personal growth
Autor: Katz, Richard, a další
Vydáno: (2009)

The Existential Truth: On Personal beyond Bounds
Autor: K. S. Golikov
Vydáno: (2020-11-01)