The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

وصف كامل

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
التنسيق:	Journal article
اللغة:	English
منشور في:	Springer Nature 2024

مواد مشابهة

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
حسب: Kirk, HR, وآخرون
منشور في: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
حسب: Kirk, HR, وآخرون
منشور في: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
حسب: Kirk, H, وآخرون
منشور في: (2021)

Exploring large language models for ontology alignment
حسب: He, Y, وآخرون
منشور في: (2023)

Survey on large language models alignment research
حسب: LIU Kunlin, وآخرون
منشور في: (2024-06-01)

Survey on large language models alignment research
حسب: LIU Kunlin, وآخرون
منشور في: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
حسب: Röttger, P, وآخرون
منشور في: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
حسب: Ahmed, Z, وآخرون
منشور في: (2022)

Auditing large language models: a three-layered approach
حسب: Mökander, J, وآخرون
منشور في: (2023)

Strong and weak alignment of large language models with human values
حسب: Mehdi Khamassi, وآخرون
منشور في: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
حسب: Liu, F, وآخرون
منشور في: (2025)

HateCheck: functional tests for hate speech detection models
حسب: Röttger, P, وآخرون
منشور في: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
حسب: Botelho, A, وآخرون
منشور في: (2021)

Lower bounds on multiple sequence alignment using exact 3-way alignment
حسب: Colbourn Charles J, وآخرون
منشور في: (2007-04-01)

Improvements in viral gene annotation using large language models and soft alignments
حسب: William L. Harrigan, وآخرون
منشور في: (2024-04-01)

The Health Star Rating system – is its reductionist (nutrient) approach a benefit or risk for tackling dietary risk factors?
حسب: Mark A Lawrence, وآخرون
منشور في: (2019-03-01)

Personality prediction based on large language models
حسب: Wee, Jewel Xin Yu
منشور في: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
حسب: Awatif Baharuddin, 1993-, وآخرون
منشور في: (2016)

Evaluating the ability of large language models to emulate personality
حسب: Yilei Wang, وآخرون
منشور في: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
حسب: Bertie Vidgen, وآخرون
منشور في: (2020-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
حسب: van Staa, T, وآخرون
منشور في: (2008)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
حسب: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
منشور في: (2024-09-01)

Reflection in Speech of the Individual-Typological Features of Language Personality
حسب: Наталія Фоміна
منشور في: (2019-11-01)

Benefit-Risk Analysis of Buprenorphine for Pain Management
حسب: Hale M, وآخرون
منشور في: (2021-05-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
حسب: Bart S Ferket, وآخرون
منشور في: (2012-01-01)

Person re-identification based on large vision-language model
حسب: Ding, Songyu
منشور في: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
حسب: Jae-hee So, وآخرون
منشور في: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
حسب: Jinhang Wei, وآخرون
منشور في: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
حسب: Pescetelli, N, وآخرون
منشور في: (2022)

Hedgerow benefits align with food production and sustainability goals
حسب: R Long, وآخرون
منشور في: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
حسب: Rachael F. Long, وآخرون
منشور في: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
حسب: Andrzej Zielezinski, وآخرون
منشور في: (2017-10-01)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
حسب: Zo Ahmed, وآخرون
منشور في: (2022-02-01)

Cross-language Wikipedia editing of Okinawa, Japan
حسب: Hale, SA
منشور في: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
حسب: Estacio Pereira, وآخرون
منشور في: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
حسب: Grimson W. Eric L., وآخرون
منشور في: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
حسب: Hauser, R, وآخرون
منشور في: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
حسب: Awatif Baharuddin, 1993-, وآخرون
منشور في: (2016)

Outward Bound as education for personal growth
حسب: Katz, Richard, وآخرون
منشور في: (2009)

The Existential Truth: On Personal beyond Bounds
حسب: K. S. Golikov
منشور في: (2020-11-01)