A bilingual benchmark for evaluating large language models

A bilingual benchmark for evaluating large language models

This work introduces a new benchmark for the bilingual evaluation of large language models (LLMs) in English and Arabic. While LLMs have transformed various fields, their evaluation in Arabic remains limited. This work addresses this gap by proposing a novel evaluation method for LLMs in both Arabic...

Full description

Bibliographic Details
Main Author:	Mohamed Alkaoud
Format:	Article
Language:	English
Published:	PeerJ Inc. 2024-02-01
Series:	PeerJ Computer Science
Subjects:	Natural language processing Large language models Multilingual NLP LLM evaluation Arabic NLP ChatGPT
Online Access:	https://peerj.com/articles/cs-1893.pdf

Similar Items

A Comprehensive Study of ChatGPT: Advancements, Limitations, and Ethical Considerations in Natural Language Processing and Cybersecurity
by: Moatsum Alawida, et al.
Published: (2023-08-01)

A Survey on Large Language Model (LLM) Security and Privacy: The Good, The Bad, and The Ugly
by: Yifan Yao, et al.
Published: (2024-06-01)

Large language models and political science
by: Mitchell Linegar, et al.
Published: (2023-10-01)

Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
by: Adi Lahat, et al.
Published: (2023-06-01)

A Mathematical Investigation of Hallucination and Creativity in GPT Models
by: Minhyeok Lee
Published: (2023-05-01)

Generative LLMs in Organic Chemistry: Transforming Esterification Reactions into Natural Language Procedures
by: Mantas Vaškevičius, et al.
Published: (2023-12-01)

DEVELOPMENT OF A QUESTION ANSWERING CHATBOT FOR BLOCKCHAIN DOMAIN
by: Aigerim Mansurova, et al.
Published: (2023-09-01)

Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer
by: Casey Watters, et al.
Published: (2023-08-01)

Students’ use of large language models in engineering education: A case study on technology acceptance, perceptions, efficacy, and detection chances
by: Margherita Bernabei, et al.
Published: (2023-01-01)

Large Language Models: Their Success and Impact
by: Spyros Makridakis, et al.
Published: (2023-08-01)

Human-like problem-solving abilities in large language models using ChatGPT
by: Graziella Orrù, et al.
Published: (2023-05-01)

Role of activity-based learning and ChatGPT on students' performance in education
by: Tamara Al Shloul, et al.
Published: (2024-06-01)

AI and its consequences for the written word
by: Thomas Hellström
Published: (2024-01-01)

Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT
by: Kyu Hong Lee, et al.
Published: (2023-12-01)

ChatGPT: Unlocking the future of NLP in finance
by: Adam Zaremba, et al.
Published: (2023-11-01)

ChatGPT and large language models in orthopedics: from education and surgery to research
by: Srijan Chatterjee, et al.
Published: (2023-01-01)

Fluent but Not Factual: A Comparative Analysis of ChatGPT and Other AI Chatbots’ Proficiency and Originality in Scientific Writing for Humanities
by: Edisa Lozić, et al.
Published: (2023-10-01)

Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google BardResearch in context
by: Zhi Wei Lim, et al.
Published: (2023-09-01)

Evaluating large language models on a highly-specialized topic, radiation oncology physics
by: Jason Holmes, et al.
Published: (2023-07-01)

Bias of AI-generated content: an examination of news produced by large language models
by: Xiao Fang, et al.
Published: (2024-03-01)

A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges
by: Mohaimenul Azam Khan Raiaan, et al.
Published: (2024-01-01)

Leveraging Large Language Models (LLM) for the Plastic Surgery Resident Training: Do They Have a Role?
by: Devi Prasad Mohapatra, et al.
Published: (2023-10-01)

Surviving ChatGPT in healthcare
by: Zhengliang Liu, et al.
Published: (2024-02-01)

Assessing the research landscape and clinical utility of large language models: a scoping review
by: Ye-Jean Park, et al.
Published: (2024-03-01)

Can large language models write reflectively
by: Yuheng Li, et al.
Published: (2023-01-01)

Empowering Few-Shot Recommender Systems With Large Language Models-Enhanced Representations
by: Zhoumeng Wang
Published: (2024-01-01)

Generative Artificial Intelligence Through ChatGPT and Other Large Language Models in Ophthalmology
by: Ting Fang Tan, MBBS, et al.
Published: (2023-12-01)

A regulatory challenge for natural language processing (NLP)‐based tools such as ChatGPT to be legally used for healthcare decisions. Where are we now?
by: Christian Baumgartner, et al.
Published: (2023-08-01)

Programming with ChatGPT: How far can we go?
by: Alessio Bucaioni, et al.
Published: (2024-03-01)

Intelligent recruitment system using deep learning with ChatGPT
by: Lim, Timothy Zhong Zheng
Published: (2024)

ChatGPT and Open-AI Models: A Preliminary Review
by: Konstantinos I. Roumeliotis, et al.
Published: (2023-05-01)

The moral machine experiment on large language models
by: Kazuhiro Takemoto
Published: (2024-02-01)

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health
by: Luigi De Angelis, et al.
Published: (2023-04-01)

The Use of ChatGPT to Assist in Diagnosing Glaucoma Based on Clinical Case Reports
by: Mohammad Delsoz, et al.
Published: (2023-09-01)

ChatGPT-Enabled daVinci Surgical Robot Prototype: Advancements and Limitations
by: Abhilash Pandya
Published: (2023-07-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
by: Paula Maddigan, et al.
Published: (2023-01-01)

ChatGPT, can you create a didactic scenario for language teaching in a museum?
Published: (2023-08-01)

Can artificial intelligence-strengthened ChatGPT or other large language models transform nucleic acid research?
by: Srijan Chatterjee, et al.
Published: (2023-09-01)

ACRF: Aggregated Conditional Random Field for Out of Vocab (OOV) Token Representation for Hindi NER
by: Sumit Singh, et al.
Published: (2024-01-01)

Composition distillation for semantic sentence embeddings
by: Vaanavan, Sezhiyan
Published: (2024)