One Size Does Not Fit All: Querying Web Polystores

Data retrieval systems are facing a paradigm shift due to the proliferation of specialized data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, and Graph) supported by varied data models (CSV, JSON, RDB, RDF, and XML). One immediate consequence of this paradigm shift results into...

Full description

Bibliographic Details
Main Authors: Yasar Khan, Antoine Zimmermann, Alokkumar Jha, Vijay Gadepally, Mathieu D'Aquin, Ratnesh Sahay
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8615997/
_version_ 1819170036566196224
author Yasar Khan
Antoine Zimmermann
Alokkumar Jha
Vijay Gadepally
Mathieu D'Aquin
Ratnesh Sahay
author_facet Yasar Khan
Antoine Zimmermann
Alokkumar Jha
Vijay Gadepally
Mathieu D'Aquin
Ratnesh Sahay
author_sort Yasar Khan
collection DOAJ
description Data retrieval systems are facing a paradigm shift due to the proliferation of specialized data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, and Graph) supported by varied data models (CSV, JSON, RDB, RDF, and XML). One immediate consequence of this paradigm shift results into data bottleneck over the web; which means, web applications are unable to retrieve data with the intensity at which data are being generated from different facilities. Especially in the genomics and healthcare verticals, data are growing from petascale to exascale, and biomedical stakeholders are expecting seamless retrieval of these data over the web. In this paper, we argue that the bottleneck over the web can be reduced by minimizing the costly data conversion process and delegating query performance and processing loads to the specialized data storage engines over their native data models. We propose a web-based query federation mechanism-called PolyWeb-that unifies query answering over multiple native data models (CSV, RDB, and RDF). We emphasize two main challenges of query federation over native data models: 1) devise a method to select prospective data sources-with different underlying data models-that can satisfy a given query and 2) query optimization, join, and execution over different data models. We demonstrate PolyWeb on a cancer genomics use case, where it is often the case that a description of biological and chemical entities (e.g., gene, disease, drug, and pathways) spans across multiple data models and respective storage engines. In order to assess the benefits and limitations of evaluating queries over native data models, we evaluate PolyWeb with the state-of-the-art query federation engines in terms of result completeness, source selection, and overall query execution time.
first_indexed 2024-12-22T19:29:01Z
format Article
id doaj.art-493742f50d55441b8ae2350fbff75eaf
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T19:29:01Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-493742f50d55441b8ae2350fbff75eaf2022-12-21T18:15:10ZengIEEEIEEE Access2169-35362019-01-0179598961710.1109/ACCESS.2018.28886018615997One Size Does Not Fit All: Querying Web PolystoresYasar Khan0https://orcid.org/0000-0003-1049-1977Antoine Zimmermann1Alokkumar Jha2https://orcid.org/0000-0002-8024-5854Vijay Gadepally3Mathieu D'Aquin4Ratnesh Sahay5Insight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandMINES Saint-Étienne, CNRS, Laboratoire Hubert Curien, UMR 5516, Univ Lyon, Saint-Étienne, FranceInsight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandMassachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, USAInsight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandInsight Centre for Data Analytics, National University of Ireland Galway, Galway, IrelandData retrieval systems are facing a paradigm shift due to the proliferation of specialized data storage engines (SQL, NoSQL, Column Stores, MapReduce, Data Stream, and Graph) supported by varied data models (CSV, JSON, RDB, RDF, and XML). One immediate consequence of this paradigm shift results into data bottleneck over the web; which means, web applications are unable to retrieve data with the intensity at which data are being generated from different facilities. Especially in the genomics and healthcare verticals, data are growing from petascale to exascale, and biomedical stakeholders are expecting seamless retrieval of these data over the web. In this paper, we argue that the bottleneck over the web can be reduced by minimizing the costly data conversion process and delegating query performance and processing loads to the specialized data storage engines over their native data models. We propose a web-based query federation mechanism-called PolyWeb-that unifies query answering over multiple native data models (CSV, RDB, and RDF). We emphasize two main challenges of query federation over native data models: 1) devise a method to select prospective data sources-with different underlying data models-that can satisfy a given query and 2) query optimization, join, and execution over different data models. We demonstrate PolyWeb on a cancer genomics use case, where it is often the case that a description of biological and chemical entities (e.g., gene, disease, drug, and pathways) spans across multiple data models and respective storage engines. In order to assess the benefits and limitations of evaluating queries over native data models, we evaluate PolyWeb with the state-of-the-art query federation engines in terms of result completeness, source selection, and overall query execution time.https://ieeexplore.ieee.org/document/8615997/Databasesworld wide webquery federationquery optimizationquery planninglinked data
spellingShingle Yasar Khan
Antoine Zimmermann
Alokkumar Jha
Vijay Gadepally
Mathieu D'Aquin
Ratnesh Sahay
One Size Does Not Fit All: Querying Web Polystores
IEEE Access
Databases
world wide web
query federation
query optimization
query planning
linked data
title One Size Does Not Fit All: Querying Web Polystores
title_full One Size Does Not Fit All: Querying Web Polystores
title_fullStr One Size Does Not Fit All: Querying Web Polystores
title_full_unstemmed One Size Does Not Fit All: Querying Web Polystores
title_short One Size Does Not Fit All: Querying Web Polystores
title_sort one size does not fit all querying web polystores
topic Databases
world wide web
query federation
query optimization
query planning
linked data
url https://ieeexplore.ieee.org/document/8615997/
work_keys_str_mv AT yasarkhan onesizedoesnotfitallqueryingwebpolystores
AT antoinezimmermann onesizedoesnotfitallqueryingwebpolystores
AT alokkumarjha onesizedoesnotfitallqueryingwebpolystores
AT vijaygadepally onesizedoesnotfitallqueryingwebpolystores
AT mathieudaquin onesizedoesnotfitallqueryingwebpolystores
AT ratneshsahay onesizedoesnotfitallqueryingwebpolystores