Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces

Data Science is increasingly applied for solving real-life problems, in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directl...

Full description

Bibliographic Details
Main Authors: Sara Pido, Pietro Pinoli, Pietro Crovari, Francesca Ieva, Franca Garzotto, Stefano Ceri
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10114402/
_version_ 1827927902206820352
author Sara Pido
Pietro Pinoli
Pietro Crovari
Francesca Ieva
Franca Garzotto
Stefano Ceri
author_facet Sara Pido
Pietro Pinoli
Pietro Crovari
Francesca Ieva
Franca Garzotto
Stefano Ceri
author_sort Sara Pido
collection DOAJ
description Data Science is increasingly applied for solving real-life problems, in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directly apply data analysis methods to their datasets, without involving a Data Science expert. In this scenario, we present DSBot, an assistant that can analyze the user data and produce answers by mastering several Data Science techniques. DSBot understands the research question with the help of conversation interaction, produces a data science pipeline and automatically executes the pipeline in order to generate analysis. The strength of DSBot lies in the design of a rich domain specific language for modeling data analysis pipelines, the use of a suitable neural network for machine translation of research questions, the availability of a vast dictionary of pipelines for matching the translation output, and the use of natural language technology provided by a conversational agent. We empirically evaluated the translation capabilities and the autoML performances of the system. In the translation task, it obtains a BLEU score of 0.8. In prediction tasks, DSBot outperforms TPOT, an autoML tool, in 19 datasets out of 30.
first_indexed 2024-03-13T05:59:45Z
format Article
id doaj.art-0c2f55141a48434284fedf3b03ccb6eb
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T05:59:45Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0c2f55141a48434284fedf3b03ccb6eb2023-06-12T23:01:00ZengIEEEIEEE Access2169-35362023-01-0111459724598810.1109/ACCESS.2023.327250310114402Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational InterfacesSara Pido0https://orcid.org/0000-0003-1425-1719Pietro Pinoli1https://orcid.org/0000-0001-9786-2851Pietro Crovari2Francesca Ieva3Franca Garzotto4https://orcid.org/0000-0003-4905-7166Stefano Ceri5https://orcid.org/0000-0003-0671-2415Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Mathematics, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyData Science is increasingly applied for solving real-life problems, in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directly apply data analysis methods to their datasets, without involving a Data Science expert. In this scenario, we present DSBot, an assistant that can analyze the user data and produce answers by mastering several Data Science techniques. DSBot understands the research question with the help of conversation interaction, produces a data science pipeline and automatically executes the pipeline in order to generate analysis. The strength of DSBot lies in the design of a rich domain specific language for modeling data analysis pipelines, the use of a suitable neural network for machine translation of research questions, the availability of a vast dictionary of pipelines for matching the translation output, and the use of natural language technology provided by a conversational agent. We empirically evaluated the translation capabilities and the autoML performances of the system. In the translation task, it obtains a BLEU score of 0.8. In prediction tasks, DSBot outperforms TPOT, an autoML tool, in 19 datasets out of 30.https://ieeexplore.ieee.org/document/10114402/Automated machine learningdata sciencehuman-computer interactionintelligent systemsnatural language understandingpipeline optimization
spellingShingle Sara Pido
Pietro Pinoli
Pietro Crovari
Francesca Ieva
Franca Garzotto
Stefano Ceri
Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces
IEEE Access
Automated machine learning
data science
human-computer interaction
intelligent systems
natural language understanding
pipeline optimization
title Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces
title_full Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces
title_fullStr Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces
title_full_unstemmed Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces
title_short Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces
title_sort ask your data x2014 supporting data science processes by combining automl and conversational interfaces
topic Automated machine learning
data science
human-computer interaction
intelligent systems
natural language understanding
pipeline optimization
url https://ieeexplore.ieee.org/document/10114402/
work_keys_str_mv AT sarapido askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces
AT pietropinoli askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces
AT pietrocrovari askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces
AT francescaieva askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces
AT francagarzotto askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces
AT stefanoceri askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces