Model merging and safety alignment: one bad model spoils the bunch

Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligne...

Full description

Bibliographic Details
Main Authors:	Hammoud, HAAK, Michieli, U, Pizzati, F, Torr, P, Bibi, A, Ghanem, B, Ozay, M
Format:	Conference item
Language:	English
Published:	Association for Computational Linguistics 2024

Model merging and safety alignment: one bad model spoils the bunch

Similar Items