A bilingual benchmark for evaluating large language models

This work introduces a new benchmark for the bilingual evaluation of large language models (LLMs) in English and Arabic. While LLMs have transformed various fields, their evaluation in Arabic remains limited. This work addresses this gap by proposing a novel evaluation method for LLMs in both Arabic...

Full description

Bibliographic Details
Main Author:	Mohamed Alkaoud
Format:	Article
Language:	English
Published:	PeerJ Inc. 2024-02-01
Series:	PeerJ Computer Science
Subjects:	Natural language processing Large language models Multilingual NLP LLM evaluation Arabic NLP ChatGPT
Online Access:	https://peerj.com/articles/cs-1893.pdf

Internet

https://peerj.com/articles/cs-1893.pdf

A bilingual benchmark for evaluating large language models

Internet

Similar Items