Inference acceleration of large language models

This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...

Full description

Bibliographic Details
Main Author:	Zhang, Boyu
Other Authors:	Mao Kezhi
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Large language model Quantization Approximate computation Self-attention Transformer
Online Access:	https://hdl.handle.net/10356/181660

Internet

https://hdl.handle.net/10356/181660

Inference acceleration of large language models

Internet

Similar Items