The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
Định dạng:	Journal article
Ngôn ngữ:	English
Được phát hành:	Springer Nature 2024

Những quyển sách tương tự

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
Bằng: Kirk, HR, et al.
Được phát hành: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
Bằng: Kirk, HR, et al.
Được phát hành: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
Bằng: Kirk, H, et al.
Được phát hành: (2021)

Exploring large language models for ontology alignment
Bằng: He, Y, et al.
Được phát hành: (2023)

Survey on large language models alignment research
Bằng: LIU Kunlin, et al.
Được phát hành: (2024-06-01)

Survey on large language models alignment research
Bằng: LIU Kunlin, et al.
Được phát hành: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
Bằng: Röttger, P, et al.
Được phát hành: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
Bằng: Ahmed, Z, et al.
Được phát hành: (2022)

Auditing large language models: a three-layered approach
Bằng: Mökander, J, et al.
Được phát hành: (2023)

Strong and weak alignment of large language models with human values
Bằng: Mehdi Khamassi, et al.
Được phát hành: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
Bằng: Liu, F, et al.
Được phát hành: (2025)

HateCheck: functional tests for hate speech detection models
Bằng: Röttger, P, et al.
Được phát hành: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
Bằng: Botelho, A, et al.
Được phát hành: (2021)

Improvements in viral gene annotation using large language models and soft alignments
Bằng: William L. Harrigan, et al.
Được phát hành: (2024-04-01)

Personality prediction based on large language models
Bằng: Wee, Jewel Xin Yu
Được phát hành: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
Bằng: Awatif Baharuddin, 1993-, et al.
Được phát hành: (2016)

Individualizing the risks and benefits of postmenopausal hormone therapy.
Bằng: van Staa, T, et al.
Được phát hành: (2008)

Evaluating the ability of large language models to emulate personality
Bằng: Yilei Wang, et al.
Được phát hành: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
Bằng: Bertie Vidgen, et al.
Được phát hành: (2020-01-01)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
Bằng: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
Được phát hành: (2024-09-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
Bằng: Bart S Ferket, et al.
Được phát hành: (2012-01-01)

Person re-identification based on large vision-language model
Bằng: Ding, Songyu
Được phát hành: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
Bằng: Jae-hee So, et al.
Được phát hành: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
Bằng: Jinhang Wei, et al.
Được phát hành: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
Bằng: Pescetelli, N, et al.
Được phát hành: (2022)

Hedgerow benefits align with food production and sustainability goals
Bằng: R Long, et al.
Được phát hành: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
Bằng: Rachael F. Long, et al.
Được phát hành: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
Bằng: Andrzej Zielezinski, et al.
Được phát hành: (2017-10-01)

Cross-language Wikipedia editing of Okinawa, Japan
Bằng: Hale, SA
Được phát hành: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
Bằng: Estacio Pereira, et al.
Được phát hành: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
Bằng: Grimson W. Eric L., et al.
Được phát hành: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
Bằng: Hauser, R, et al.
Được phát hành: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
Bằng: Awatif Baharuddin, 1993-, et al.
Được phát hành: (2016)

Outward Bound as education for personal growth
Bằng: Katz, Richard, et al.
Được phát hành: (2009)

The Existential Truth: On Personal beyond Bounds
Bằng: K. S. Golikov
Được phát hành: (2020-11-01)

The Dengue Vaccine Dilemma: Balancing the Individual and Population Risks and Benefits.
Bằng: Jacqueline Deen
Được phát hành: (2016-11-01)

Aligning subtitles in sign language videos
Bằng: Bull, H, et al.
Được phát hành: (2022)

Morphological Alignment in Khorramabad Lori Language
Bằng: Fatemeh Akoondi
Được phát hành: (2023-03-01)

Aligning English Language Testing With Curriculum
Bằng: Marcela Palacio, et al.
Được phát hành: (2016-07-01)

The Opportunities and Risks of Large Language Models in Mental Health
Bằng: Hannah R Lawrence, et al.
Được phát hành: (2024-07-01)