The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

書誌詳細
主要な著者:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
フォーマット:	Journal article
言語:	English
出版事項:	Springer Nature 2024

類似資料

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
著者:: Kirk, HR, 等
出版事項: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
著者:: Kirk, HR, 等
出版事項: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
著者:: Kirk, H, 等
出版事項: (2021)

Exploring large language models for ontology alignment
著者:: He, Y, 等
出版事項: (2023)

Survey on large language models alignment research
著者:: LIU Kunlin, 等
出版事項: (2024-06-01)

Survey on large language models alignment research
著者:: LIU Kunlin, 等
出版事項: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
著者:: Röttger, P, 等
出版事項: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
著者:: Ahmed, Z, 等
出版事項: (2022)

Auditing large language models: a three-layered approach
著者:: Mökander, J, 等
出版事項: (2023)

Strong and weak alignment of large language models with human values
著者:: Mehdi Khamassi, 等
出版事項: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
著者:: Liu, F, 等
出版事項: (2025)

HateCheck: functional tests for hate speech detection models
著者:: Röttger, P, 等
出版事項: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
著者:: Botelho, A, 等
出版事項: (2021)

Lower bounds on multiple sequence alignment using exact 3-way alignment
著者:: Colbourn Charles J, 等
出版事項: (2007-04-01)

Improvements in viral gene annotation using large language models and soft alignments
著者:: William L. Harrigan, 等
出版事項: (2024-04-01)

The Health Star Rating system – is its reductionist (nutrient) approach a benefit or risk for tackling dietary risk factors?
著者:: Mark A Lawrence, 等
出版事項: (2019-03-01)

Personality prediction based on large language models
著者:: Wee, Jewel Xin Yu
出版事項: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
著者:: Awatif Baharuddin, 1993-, 等
出版事項: (2016)

Evaluating the ability of large language models to emulate personality
著者:: Yilei Wang, 等
出版事項: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
著者:: Bertie Vidgen, 等
出版事項: (2020-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
著者:: van Staa, T, 等
出版事項: (2008)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
著者:: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
出版事項: (2024-09-01)

Reflection in Speech of the Individual-Typological Features of Language Personality
著者:: Наталія Фоміна
出版事項: (2019-11-01)

Benefit-Risk Analysis of Buprenorphine for Pain Management
著者:: Hale M, 等
出版事項: (2021-05-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
著者:: Bart S Ferket, 等
出版事項: (2012-01-01)

Person re-identification based on large vision-language model
著者:: Ding, Songyu
出版事項: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
著者:: Jae-hee So, 等
出版事項: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
著者:: Jinhang Wei, 等
出版事項: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
著者:: Pescetelli, N, 等
出版事項: (2022)

Hedgerow benefits align with food production and sustainability goals
著者:: R Long, 等
出版事項: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
著者:: Rachael F. Long, 等
出版事項: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
著者:: Andrzej Zielezinski, 等
出版事項: (2017-10-01)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
著者:: Zo Ahmed, 等
出版事項: (2022-02-01)

Cross-language Wikipedia editing of Okinawa, Japan
著者:: Hale, SA
出版事項: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
著者:: Estacio Pereira, 等
出版事項: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
著者:: Grimson W. Eric L., 等
出版事項: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
著者:: Hauser, R, 等
出版事項: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
著者:: Awatif Baharuddin, 1993-, 等
出版事項: (2016)

Outward Bound as education for personal growth
著者:: Katz, Richard, 等
出版事項: (2009)

The Existential Truth: On Personal beyond Bounds
著者:: K. S. Golikov
出版事項: (2020-11-01)