A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)

This paper presents an innovative testing framework, <i>testFAILS</i>, designed for the rigorous evaluation of AI Linguistic Systems (AILS), with particular emphasis on the various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for...

Full description

Bibliographic Details
Main Authors:	Yulia Kumar, Patricia Morreale, Peter Sorial, Justin Delgado, J. Jenny Li, Patrick Martins
Format:	Article
Language:	English
Published:	MDPI AG 2023-07-01
Series:	Electronics
Subjects:	chatbots chatbot validation bots a testing framework for AI linguistic systems (<i>testFAILS</i>) <i>AIDoctor</i>
Online Access:	https://www.mdpi.com/2079-9292/12/14/3095

_version_	1797589530446921728
author	Yulia Kumar Patricia Morreale Peter Sorial Justin Delgado J. Jenny Li Patrick Martins
author_facet	Yulia Kumar Patricia Morreale Peter Sorial Justin Delgado J. Jenny Li Patrick Martins
author_sort	Yulia Kumar
collection	DOAJ
description	This paper presents an innovative testing framework, <i>testFAILS</i>, designed for the rigorous evaluation of AI Linguistic Systems (AILS), with particular emphasis on the various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, “How should AI be evaluated?” While the Turing test has traditionally been the benchmark for AI evaluation, it is argued that current, publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing-test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Ongoing research has already validated several versions of ChatGPT, and comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA and PaLM 2 models, is currently being conducted. The <i>testFAILS</i> framework is designed to be adaptable, ready to evaluate new chatbot versions as they are released. Additionally, available chatbot APIs have been tested and applications have been developed, one of them being <i>AIDoctor</i>, presented in this paper, which utilizes the ChatGPT-4 model and Microsoft Azure AI technologies.
first_indexed	2024-03-11T01:07:48Z
format	Article
id	doaj.art-7613a35355b44d1dbc90794262a4077c
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T01:07:48Z
publishDate	2023-07-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-7613a35355b44d1dbc90794262a4077c2023-11-18T19:05:47ZengMDPI AGElectronics2079-92922023-07-011214309510.3390/electronics12143095A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)Yulia Kumar0Patricia Morreale1Peter Sorial2Justin Delgado3J. Jenny Li4Patrick Martins5Department of Computer Science and Technology, Kean University, Union, NJ 07083, USADepartment of Computer Science and Technology, Kean University, Union, NJ 07083, USADepartment of Computer Science and Technology, Kean University, Union, NJ 07083, USADepartment of Computer Science and Technology, Kean University, Union, NJ 07083, USADepartment of Computer Science and Technology, Kean University, Union, NJ 07083, USADepartment of Computer Science and Technology, Kean University, Union, NJ 07083, USAThis paper presents an innovative testing framework, <i>testFAILS</i>, designed for the rigorous evaluation of AI Linguistic Systems (AILS), with particular emphasis on the various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, “How should AI be evaluated?” While the Turing test has traditionally been the benchmark for AI evaluation, it is argued that current, publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing-test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Ongoing research has already validated several versions of ChatGPT, and comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA and PaLM 2 models, is currently being conducted. The <i>testFAILS</i> framework is designed to be adaptable, ready to evaluate new chatbot versions as they are released. Additionally, available chatbot APIs have been tested and applications have been developed, one of them being <i>AIDoctor</i>, presented in this paper, which utilizes the ChatGPT-4 model and Microsoft Azure AI technologies.https://www.mdpi.com/2079-9292/12/14/3095chatbotschatbot validationbotsa testing framework for AI linguistic systems (<i>testFAILS</i>)<i>AIDoctor</i>
spellingShingle	Yulia Kumar Patricia Morreale Peter Sorial Justin Delgado J. Jenny Li Patrick Martins A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>) Electronics chatbots chatbot validation bots a testing framework for AI linguistic systems (<i>testFAILS</i>) <i>AIDoctor</i>
title	A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)
title_full	A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)
title_fullStr	A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)
title_full_unstemmed	A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)
title_short	A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)
title_sort	testing framework for ai linguistic systems i testfails i
topic	chatbots chatbot validation bots a testing framework for AI linguistic systems (<i>testFAILS</i>) <i>AIDoctor</i>
url	https://www.mdpi.com/2079-9292/12/14/3095
work_keys_str_mv	AT yuliakumar atestingframeworkforailinguisticsystemsitestfailsi AT patriciamorreale atestingframeworkforailinguisticsystemsitestfailsi AT petersorial atestingframeworkforailinguisticsystemsitestfailsi AT justindelgado atestingframeworkforailinguisticsystemsitestfailsi AT jjennyli atestingframeworkforailinguisticsystemsitestfailsi AT patrickmartins atestingframeworkforailinguisticsystemsitestfailsi AT yuliakumar testingframeworkforailinguisticsystemsitestfailsi AT patriciamorreale testingframeworkforailinguisticsystemsitestfailsi AT petersorial testingframeworkforailinguisticsystemsitestfailsi AT justindelgado testingframeworkforailinguisticsystemsitestfailsi AT jjennyli testingframeworkforailinguisticsystemsitestfailsi AT patrickmartins testingframeworkforailinguisticsystemsitestfailsi

A Testing Framework for AI Linguistic Systems (<i>testFAILS</i>)

Similar Items