One Size Does Not Fit All: Querying Web Polystores
Data retrieval systems are facing a paradigm shift due to the proliferation of specialized data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, and Graph) supported by varied data models (CSV, JSON, RDB, RDF, and XML). One immediate consequence of this paradigm shift results into...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8615997/ |
_version_ | 1819170036566196224 |
---|---|
author | Yasar Khan Antoine Zimmermann Alokkumar Jha Vijay Gadepally Mathieu D'Aquin Ratnesh Sahay |
author_facet | Yasar Khan Antoine Zimmermann Alokkumar Jha Vijay Gadepally Mathieu D'Aquin Ratnesh Sahay |
author_sort | Yasar Khan |
collection | DOAJ |
description | Data retrieval systems are facing a paradigm shift due to the proliferation of specialized data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, and Graph) supported by varied data models (CSV, JSON, RDB, RDF, and XML). One immediate consequence of this paradigm shift results into data bottleneck over the web; which means, web applications are unable to retrieve data with the intensity at which data are being generated from different facilities. Especially in the genomics and healthcare verticals, data are growing from petascale to exascale, and biomedical stakeholders are expecting seamless retrieval of these data over the web. In this paper, we argue that the bottleneck over the web can be reduced by minimizing the costly data conversion process and delegating query performance and processing loads to the specialized data storage engines over their native data models. We propose a web-based query federation mechanism-called PolyWeb-that unifies query answering over multiple native data models (CSV, RDB, and RDF). We emphasize two main challenges of query federation over native data models: 1) devise a method to select prospective data sources-with different underlying data models-that can satisfy a given query and 2) query optimization, join, and execution over different data models. We demonstrate PolyWeb on a cancer genomics use case, where it is often the case that a description of biological and chemical entities (e.g., gene, disease, drug, and pathways) spans across multiple data models and respective storage engines. In order to assess the benefits and limitations of evaluating queries over native data models, we evaluate PolyWeb with the state-of-the-art query federation engines in terms of result completeness, source selection, and overall query execution time. |
first_indexed | 2024-12-22T19:29:01Z |
format | Article |
id | doaj.art-493742f50d55441b8ae2350fbff75eaf |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-22T19:29:01Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-493742f50d55441b8ae2350fbff75eaf2022-12-21T18:15:10ZengIEEEIEEE Access2169-35362019-01-0179598961710.1109/ACCESS.2018.28886018615997One Size Does Not Fit All: Querying Web PolystoresYasar Khan0https://orcid.org/0000-0003-1049-1977Antoine Zimmermann1Alokkumar Jha2https://orcid.org/0000-0002-8024-5854Vijay Gadepally3Mathieu D'Aquin4Ratnesh Sahay5Insight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandMINES Saint-Étienne, CNRS, Laboratoire Hubert Curien, UMR 5516, Univ Lyon, Saint-Étienne, FranceInsight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandMassachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, USAInsight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandInsight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandData retrieval systems are facing a paradigm shift due to the proliferation of specialized data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, and Graph) supported by varied data models (CSV, JSON, RDB, RDF, and XML). One immediate consequence of this paradigm shift results into data bottleneck over the web; which means, web applications are unable to retrieve data with the intensity at which data are being generated from different facilities. Especially in the genomics and healthcare verticals, data are growing from petascale to exascale, and biomedical stakeholders are expecting seamless retrieval of these data over the web. In this paper, we argue that the bottleneck over the web can be reduced by minimizing the costly data conversion process and delegating query performance and processing loads to the specialized data storage engines over their native data models. We propose a web-based query federation mechanism-called PolyWeb-that unifies query answering over multiple native data models (CSV, RDB, and RDF). We emphasize two main challenges of query federation over native data models: 1) devise a method to select prospective data sources-with different underlying data models-that can satisfy a given query and 2) query optimization, join, and execution over different data models. We demonstrate PolyWeb on a cancer genomics use case, where it is often the case that a description of biological and chemical entities (e.g., gene, disease, drug, and pathways) spans across multiple data models and respective storage engines. In order to assess the benefits and limitations of evaluating queries over native data models, we evaluate PolyWeb with the state-of-the-art query federation engines in terms of result completeness, source selection, and overall query execution time.https://ieeexplore.ieee.org/document/8615997/Databasesworld wide webquery federationquery optimizationquery planninglinked data |
spellingShingle | Yasar Khan Antoine Zimmermann Alokkumar Jha Vijay Gadepally Mathieu D'Aquin Ratnesh Sahay One Size Does Not Fit All: Querying Web Polystores IEEE Access Databases world wide web query federation query optimization query planning linked data |
title | One Size Does Not Fit All: Querying Web Polystores |
title_full | One Size Does Not Fit All: Querying Web Polystores |
title_fullStr | One Size Does Not Fit All: Querying Web Polystores |
title_full_unstemmed | One Size Does Not Fit All: Querying Web Polystores |
title_short | One Size Does Not Fit All: Querying Web Polystores |
title_sort | one size does not fit all querying web polystores |
topic | Databases world wide web query federation query optimization query planning linked data |
url | https://ieeexplore.ieee.org/document/8615997/ |
work_keys_str_mv | AT yasarkhan onesizedoesnotfitallqueryingwebpolystores AT antoinezimmermann onesizedoesnotfitallqueryingwebpolystores AT alokkumarjha onesizedoesnotfitallqueryingwebpolystores AT vijaygadepally onesizedoesnotfitallqueryingwebpolystores AT mathieudaquin onesizedoesnotfitallqueryingwebpolystores AT ratneshsahay onesizedoesnotfitallqueryingwebpolystores |