Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces
Data Science is increasingly applied for solving real-life problems, in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directl...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10114402/ |
_version_ | 1827927902206820352 |
---|---|
author | Sara Pido Pietro Pinoli Pietro Crovari Francesca Ieva Franca Garzotto Stefano Ceri |
author_facet | Sara Pido Pietro Pinoli Pietro Crovari Francesca Ieva Franca Garzotto Stefano Ceri |
author_sort | Sara Pido |
collection | DOAJ |
description | Data Science is increasingly applied for solving real-life problems, in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directly apply data analysis methods to their datasets, without involving a Data Science expert. In this scenario, we present DSBot, an assistant that can analyze the user data and produce answers by mastering several Data Science techniques. DSBot understands the research question with the help of conversation interaction, produces a data science pipeline and automatically executes the pipeline in order to generate analysis. The strength of DSBot lies in the design of a rich domain specific language for modeling data analysis pipelines, the use of a suitable neural network for machine translation of research questions, the availability of a vast dictionary of pipelines for matching the translation output, and the use of natural language technology provided by a conversational agent. We empirically evaluated the translation capabilities and the autoML performances of the system. In the translation task, it obtains a BLEU score of 0.8. In prediction tasks, DSBot outperforms TPOT, an autoML tool, in 19 datasets out of 30. |
first_indexed | 2024-03-13T05:59:45Z |
format | Article |
id | doaj.art-0c2f55141a48434284fedf3b03ccb6eb |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T05:59:45Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0c2f55141a48434284fedf3b03ccb6eb2023-06-12T23:01:00ZengIEEEIEEE Access2169-35362023-01-0111459724598810.1109/ACCESS.2023.327250310114402Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational InterfacesSara Pido0https://orcid.org/0000-0003-1425-1719Pietro Pinoli1https://orcid.org/0000-0001-9786-2851Pietro Crovari2Francesca Ieva3Franca Garzotto4https://orcid.org/0000-0003-4905-7166Stefano Ceri5https://orcid.org/0000-0003-0671-2415Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Mathematics, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, ItalyData Science is increasingly applied for solving real-life problems, in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directly apply data analysis methods to their datasets, without involving a Data Science expert. In this scenario, we present DSBot, an assistant that can analyze the user data and produce answers by mastering several Data Science techniques. DSBot understands the research question with the help of conversation interaction, produces a data science pipeline and automatically executes the pipeline in order to generate analysis. The strength of DSBot lies in the design of a rich domain specific language for modeling data analysis pipelines, the use of a suitable neural network for machine translation of research questions, the availability of a vast dictionary of pipelines for matching the translation output, and the use of natural language technology provided by a conversational agent. We empirically evaluated the translation capabilities and the autoML performances of the system. In the translation task, it obtains a BLEU score of 0.8. In prediction tasks, DSBot outperforms TPOT, an autoML tool, in 19 datasets out of 30.https://ieeexplore.ieee.org/document/10114402/Automated machine learningdata sciencehuman-computer interactionintelligent systemsnatural language understandingpipeline optimization |
spellingShingle | Sara Pido Pietro Pinoli Pietro Crovari Francesca Ieva Franca Garzotto Stefano Ceri Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces IEEE Access Automated machine learning data science human-computer interaction intelligent systems natural language understanding pipeline optimization |
title | Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces |
title_full | Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces |
title_fullStr | Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces |
title_full_unstemmed | Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces |
title_short | Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational Interfaces |
title_sort | ask your data x2014 supporting data science processes by combining automl and conversational interfaces |
topic | Automated machine learning data science human-computer interaction intelligent systems natural language understanding pipeline optimization |
url | https://ieeexplore.ieee.org/document/10114402/ |
work_keys_str_mv | AT sarapido askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces AT pietropinoli askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces AT pietrocrovari askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces AT francescaieva askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces AT francagarzotto askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces AT stefanoceri askyourdatax2014supportingdatascienceprocessesbycombiningautomlandconversationalinterfaces |