A Cardinality Estimator in Complex Database Systems Based on TreeLSTM

Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accuratel...

Full description

Bibliographic Details
Main Authors: Kaiyang Qi, Jiong Yu, Zhenzhen He
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/17/7364
_version_ 1797581858492383232
author Kaiyang Qi
Jiong Yu
Zhenzhen He
author_facet Kaiyang Qi
Jiong Yu
Zhenzhen He
author_sort Kaiyang Qi
collection DOAJ
description Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accurately capture the correlation between multiple tables. Several recent studies have revealed that learning-based cardinality estimation methods can address the shortcomings of traditional methods and provide more accurate estimates. However, the learning-based cardinality estimation methods still have large errors when an SQL query involves multiple tables or is very complex. To address this problem, we propose a sampling-based tree long short-term memory (TreeLSTM) neural network to model queries. The proposed model addresses the weakness of traditional methods when no sampled tuples match the predicates and considers the join relationship between multiple tables and the conjunction and disjunction operations between predicates. We construct subexpressions as trees using operator types between predicates and improve the performance and accuracy of cardinality estimation by capturing the join-crossing correlations between tables and the order dependencies between predicates. In addition, we construct a new loss function to overcome the drawback that <i>Q-error</i> cannot distinguish between large and small cardinalities. Extensive experimental results from real-world datasets show that our proposed model improves the estimation quality and outperforms traditional cardinality estimation methods and the other compared deep learning methods in three evaluation metrics: <i>Q-error</i>, <i>MAE</i>, and <i>SMAPE</i>.
first_indexed 2024-03-10T23:13:33Z
format Article
id doaj.art-959eb9734b0e4589be98a35dd35337d4
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T23:13:33Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-959eb9734b0e4589be98a35dd35337d42023-11-19T08:48:53ZengMDPI AGSensors1424-82202023-08-012317736410.3390/s23177364A Cardinality Estimator in Complex Database Systems Based on TreeLSTMKaiyang Qi0Jiong Yu1Zhenzhen He2School of Software, Xinjiang University, Urumqi 830091, ChinaSchool of Software, Xinjiang University, Urumqi 830091, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accurately capture the correlation between multiple tables. Several recent studies have revealed that learning-based cardinality estimation methods can address the shortcomings of traditional methods and provide more accurate estimates. However, the learning-based cardinality estimation methods still have large errors when an SQL query involves multiple tables or is very complex. To address this problem, we propose a sampling-based tree long short-term memory (TreeLSTM) neural network to model queries. The proposed model addresses the weakness of traditional methods when no sampled tuples match the predicates and considers the join relationship between multiple tables and the conjunction and disjunction operations between predicates. We construct subexpressions as trees using operator types between predicates and improve the performance and accuracy of cardinality estimation by capturing the join-crossing correlations between tables and the order dependencies between predicates. In addition, we construct a new loss function to overcome the drawback that <i>Q-error</i> cannot distinguish between large and small cardinalities. Extensive experimental results from real-world datasets show that our proposed model improves the estimation quality and outperforms traditional cardinality estimation methods and the other compared deep learning methods in three evaluation metrics: <i>Q-error</i>, <i>MAE</i>, and <i>SMAPE</i>.https://www.mdpi.com/1424-8220/23/17/7364cardinality estimationtree long short-term memoryquery optimizationdeep learning
spellingShingle Kaiyang Qi
Jiong Yu
Zhenzhen He
A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
Sensors
cardinality estimation
tree long short-term memory
query optimization
deep learning
title A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_full A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_fullStr A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_full_unstemmed A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_short A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_sort cardinality estimator in complex database systems based on treelstm
topic cardinality estimation
tree long short-term memory
query optimization
deep learning
url https://www.mdpi.com/1424-8220/23/17/7364
work_keys_str_mv AT kaiyangqi acardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT jiongyu acardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT zhenzhenhe acardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT kaiyangqi cardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT jiongyu cardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT zhenzhenhe cardinalityestimatorincomplexdatabasesystemsbasedontreelstm