API2CAN: a dataset & service for canonical utterance generation for REST APIs

Abstract Objectives Recently natural language interfaces (e.g., chatbots) have gained enormous attention. Such interfaces execute underlying application programming interfaces (APIs) based on the user's utterances to perform tasks (e.g., reporting weather). Supervised approaches for building su...

Full description

Bibliographic Details
Main Authors: Mohammad-Ali Yaghoub-Zadeh-Fard, Boualem Benatallah
Format: Article
Language:English
Published: BMC 2021-09-01
Series:BMC Research Notes
Subjects:
Online Access:https://doi.org/10.1186/s13104-021-05593-w
_version_ 1818442292881522688
author Mohammad-Ali Yaghoub-Zadeh-Fard
Boualem Benatallah
author_facet Mohammad-Ali Yaghoub-Zadeh-Fard
Boualem Benatallah
author_sort Mohammad-Ali Yaghoub-Zadeh-Fard
collection DOAJ
description Abstract Objectives Recently natural language interfaces (e.g., chatbots) have gained enormous attention. Such interfaces execute underlying application programming interfaces (APIs) based on the user's utterances to perform tasks (e.g., reporting weather). Supervised approaches for building such interfaces rely upon a large set of user utterances paired with APIs. Collecting such pairs is typically starts with obtaining initial utterances for a given API method. Generating initial utterances can be considered as a machine translation task in which an API method is translated into an utterance. However, the key challenge is the lack of training samples for training domain-independent translation models. In this paper, we propose a dataset for training supervised models to generate initial utterances for APIs. Data description The dataset contains 14,370 pairs of API methods and utterances. It is built automatically by converting method descriptions of a large number of APIs to user utterances; and it is cleaned manually to ensure quality. The dataset is also accompanied with a set of microservices (e.g., translating API methods to utterances) which can facilitate the process of collecting training samples for building natural language interfaces.
first_indexed 2024-12-14T18:41:50Z
format Article
id doaj.art-18519cab09a34f8690eb5649436f0357
institution Directory Open Access Journal
issn 1756-0500
language English
last_indexed 2024-12-14T18:41:50Z
publishDate 2021-09-01
publisher BMC
record_format Article
series BMC Research Notes
spelling doaj.art-18519cab09a34f8690eb5649436f03572022-12-21T22:51:28ZengBMCBMC Research Notes1756-05002021-09-011411310.1186/s13104-021-05593-wAPI2CAN: a dataset & service for canonical utterance generation for REST APIsMohammad-Ali Yaghoub-Zadeh-Fard0Boualem Benatallah1UNSW SydneyUNSW SydneyAbstract Objectives Recently natural language interfaces (e.g., chatbots) have gained enormous attention. Such interfaces execute underlying application programming interfaces (APIs) based on the user's utterances to perform tasks (e.g., reporting weather). Supervised approaches for building such interfaces rely upon a large set of user utterances paired with APIs. Collecting such pairs is typically starts with obtaining initial utterances for a given API method. Generating initial utterances can be considered as a machine translation task in which an API method is translated into an utterance. However, the key challenge is the lack of training samples for training domain-independent translation models. In this paper, we propose a dataset for training supervised models to generate initial utterances for APIs. Data description The dataset contains 14,370 pairs of API methods and utterances. It is built automatically by converting method descriptions of a large number of APIs to user utterances; and it is cleaned manually to ensure quality. The dataset is also accompanied with a set of microservices (e.g., translating API methods to utterances) which can facilitate the process of collecting training samples for building natural language interfaces.https://doi.org/10.1186/s13104-021-05593-wChatbotsBot developmentNatural language interfaces
spellingShingle Mohammad-Ali Yaghoub-Zadeh-Fard
Boualem Benatallah
API2CAN: a dataset & service for canonical utterance generation for REST APIs
BMC Research Notes
Chatbots
Bot development
Natural language interfaces
title API2CAN: a dataset & service for canonical utterance generation for REST APIs
title_full API2CAN: a dataset & service for canonical utterance generation for REST APIs
title_fullStr API2CAN: a dataset & service for canonical utterance generation for REST APIs
title_full_unstemmed API2CAN: a dataset & service for canonical utterance generation for REST APIs
title_short API2CAN: a dataset & service for canonical utterance generation for REST APIs
title_sort api2can a dataset service for canonical utterance generation for rest apis
topic Chatbots
Bot development
Natural language interfaces
url https://doi.org/10.1186/s13104-021-05593-w
work_keys_str_mv AT mohammadaliyaghoubzadehfard api2canadatasetserviceforcanonicalutterancegenerationforrestapis
AT boualembenatallah api2canadatasetserviceforcanonicalutterancegenerationforrestapis