The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

תיאור מלא

מידע ביבליוגרפי
Main Authors:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
פורמט:	Journal article
שפה:	English
יצא לאור:	Springer Nature 2024

פריטים דומים

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
מאת: Kirk, HR, et al.
יצא לאור: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
מאת: Kirk, HR, et al.
יצא לאור: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
מאת: Kirk, H, et al.
יצא לאור: (2021)

Exploring large language models for ontology alignment
מאת: He, Y, et al.
יצא לאור: (2023)

Survey on large language models alignment research
מאת: LIU Kunlin, et al.
יצא לאור: (2024-06-01)

Survey on large language models alignment research
מאת: LIU Kunlin, et al.
יצא לאור: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
מאת: Röttger, P, et al.
יצא לאור: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
מאת: Ahmed, Z, et al.
יצא לאור: (2022)

Auditing large language models: a three-layered approach
מאת: Mökander, J, et al.
יצא לאור: (2023)

Strong and weak alignment of large language models with human values
מאת: Mehdi Khamassi, et al.
יצא לאור: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
מאת: Liu, F, et al.
יצא לאור: (2025)

HateCheck: functional tests for hate speech detection models
מאת: Röttger, P, et al.
יצא לאור: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
מאת: Botelho, A, et al.
יצא לאור: (2021)

Improvements in viral gene annotation using large language models and soft alignments
מאת: William L. Harrigan, et al.
יצא לאור: (2024-04-01)

Personality prediction based on large language models
מאת: Wee, Jewel Xin Yu
יצא לאור: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
מאת: Awatif Baharuddin, 1993-, et al.
יצא לאור: (2016)

Individualizing the risks and benefits of postmenopausal hormone therapy.
מאת: van Staa, T, et al.
יצא לאור: (2008)

Evaluating the ability of large language models to emulate personality
מאת: Yilei Wang, et al.
יצא לאור: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
מאת: Bertie Vidgen, et al.
יצא לאור: (2020-01-01)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
מאת: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
יצא לאור: (2024-09-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
מאת: Bart S Ferket, et al.
יצא לאור: (2012-01-01)

Person re-identification based on large vision-language model
מאת: Ding, Songyu
יצא לאור: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
מאת: Jae-hee So, et al.
יצא לאור: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
מאת: Jinhang Wei, et al.
יצא לאור: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
מאת: Pescetelli, N, et al.
יצא לאור: (2022)

Hedgerow benefits align with food production and sustainability goals
מאת: R Long, et al.
יצא לאור: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
מאת: Rachael F. Long, et al.
יצא לאור: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
מאת: Andrzej Zielezinski, et al.
יצא לאור: (2017-10-01)

Cross-language Wikipedia editing of Okinawa, Japan
מאת: Hale, SA
יצא לאור: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
מאת: Estacio Pereira, et al.
יצא לאור: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
מאת: Grimson W. Eric L., et al.
יצא לאור: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
מאת: Hauser, R, et al.
יצא לאור: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
מאת: Awatif Baharuddin, 1993-, et al.
יצא לאור: (2016)

Outward Bound as education for personal growth
מאת: Katz, Richard, et al.
יצא לאור: (2009)

The Existential Truth: On Personal beyond Bounds
מאת: K. S. Golikov
יצא לאור: (2020-11-01)

The Dengue Vaccine Dilemma: Balancing the Individual and Population Risks and Benefits.
מאת: Jacqueline Deen
יצא לאור: (2016-11-01)

Aligning subtitles in sign language videos
מאת: Bull, H, et al.
יצא לאור: (2022)

Morphological Alignment in Khorramabad Lori Language
מאת: Fatemeh Akoondi
יצא לאור: (2023-03-01)

Aligning English Language Testing With Curriculum
מאת: Marcela Palacio, et al.
יצא לאור: (2016-07-01)

The Opportunities and Risks of Large Language Models in Mental Health
מאת: Hannah R Lawrence, et al.
יצא לאור: (2024-07-01)