An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries
Large-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called ‘snapshot plus log’. A s...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9091833/ |
_version_ | 1828735652235051008 |
---|---|
author | Luo Xiangyu Luo Yingxiao Gui Xiaolin Yu Zhenhua |
author_facet | Luo Xiangyu Luo Yingxiao Gui Xiaolin Yu Zhenhua |
author_sort | Luo Xiangyu |
collection | DOAJ |
description | Large-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called ‘snapshot plus log’. A snapshot records the whole data at a certain moment, while the log file is responsible for saving all the update operations. The historical state is then recreated from the nearest snapshot by redoing or undoing the related update operations saved in the log file. The challenge lies in how to minimize both the number of snapshots and that of redone and undone operations performed in historical state recreation. The traditional system stores snapshots at regular intervals. However, historical states do not share the same frequency of being requested. Therefore, the traditional strategy is very inefficient. This paper proposes a new strategy that determines the timestamps of the snapshots based on the distribution of the historical queries. First, the historical queries are clustered into a given number of groups according to the timestamps of the requested historical states, and the cluster centroids are calculated. Second, the snapshots are created according to the timestamps of the cluster centroids. Since the cluster centroids may change as time goes by, the above process is executed periodically. Experimental results show that with the same storage costs, the snapshot strategy proposed in this paper greatly improves the performance of recreating historical states, leading to at least 70.7% computation reduction in terms of the number of both redone and undone operations. Besides, with the same recreation performance guarantee, it brings nearly 78.9% storage reduction on average in terms of the number of snapshots. |
first_indexed | 2024-04-12T23:10:04Z |
format | Article |
id | doaj.art-feebf7b7c281461c87e7618cd4fb08db |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-12T23:10:04Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-feebf7b7c281461c87e7618cd4fb08db2022-12-22T03:12:49ZengIEEEIEEE Access2169-35362020-01-018908389084610.1109/ACCESS.2020.29942429091833An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical QueriesLuo Xiangyu0https://orcid.org/0000-0003-4033-0285Luo Yingxiao1Gui Xiaolin2https://orcid.org/0000-0003-4384-9891Yu Zhenhua3https://orcid.org/0000-0002-7204-3654College of Computer Science and Technology, Institute of Systems Security and Control, Xi’an University of Science and Technology, Xi’an, ChinaCollege of Computer Science and Technology, Institute of Systems Security and Control, Xi’an University of Science and Technology, Xi’an, ChinaSchool of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaCollege of Computer Science and Technology, Institute of Systems Security and Control, Xi’an University of Science and Technology, Xi’an, ChinaLarge-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called ‘snapshot plus log’. A snapshot records the whole data at a certain moment, while the log file is responsible for saving all the update operations. The historical state is then recreated from the nearest snapshot by redoing or undoing the related update operations saved in the log file. The challenge lies in how to minimize both the number of snapshots and that of redone and undone operations performed in historical state recreation. The traditional system stores snapshots at regular intervals. However, historical states do not share the same frequency of being requested. Therefore, the traditional strategy is very inefficient. This paper proposes a new strategy that determines the timestamps of the snapshots based on the distribution of the historical queries. First, the historical queries are clustered into a given number of groups according to the timestamps of the requested historical states, and the cluster centroids are calculated. Second, the snapshots are created according to the timestamps of the cluster centroids. Since the cluster centroids may change as time goes by, the above process is executed periodically. Experimental results show that with the same storage costs, the snapshot strategy proposed in this paper greatly improves the performance of recreating historical states, leading to at least 70.7% computation reduction in terms of the number of both redone and undone operations. Besides, with the same recreation performance guarantee, it brings nearly 78.9% storage reduction on average in terms of the number of snapshots.https://ieeexplore.ieee.org/document/9091833/Dynamic graphsnapshotlogclustering methodhistorical query |
spellingShingle | Luo Xiangyu Luo Yingxiao Gui Xiaolin Yu Zhenhua An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries IEEE Access Dynamic graph snapshot log clustering method historical query |
title | An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries |
title_full | An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries |
title_fullStr | An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries |
title_full_unstemmed | An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries |
title_short | An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries |
title_sort | efficient snapshot strategy for dynamic graph storage systems to support historical queries |
topic | Dynamic graph snapshot log clustering method historical query |
url | https://ieeexplore.ieee.org/document/9091833/ |
work_keys_str_mv | AT luoxiangyu anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries AT luoyingxiao anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries AT guixiaolin anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries AT yuzhenhua anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries AT luoxiangyu efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries AT luoyingxiao efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries AT guixiaolin efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries AT yuzhenhua efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries |