An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries

Large-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called ‘snapshot plus log’. A s...

Full description

Bibliographic Details
Main Authors: Luo Xiangyu, Luo Yingxiao, Gui Xiaolin, Yu Zhenhua
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9091833/
_version_ 1828735652235051008
author Luo Xiangyu
Luo Yingxiao
Gui Xiaolin
Yu Zhenhua
author_facet Luo Xiangyu
Luo Yingxiao
Gui Xiaolin
Yu Zhenhua
author_sort Luo Xiangyu
collection DOAJ
description Large-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called ‘snapshot plus log’. A snapshot records the whole data at a certain moment, while the log file is responsible for saving all the update operations. The historical state is then recreated from the nearest snapshot by redoing or undoing the related update operations saved in the log file. The challenge lies in how to minimize both the number of snapshots and that of redone and undone operations performed in historical state recreation. The traditional system stores snapshots at regular intervals. However, historical states do not share the same frequency of being requested. Therefore, the traditional strategy is very inefficient. This paper proposes a new strategy that determines the timestamps of the snapshots based on the distribution of the historical queries. First, the historical queries are clustered into a given number of groups according to the timestamps of the requested historical states, and the cluster centroids are calculated. Second, the snapshots are created according to the timestamps of the cluster centroids. Since the cluster centroids may change as time goes by, the above process is executed periodically. Experimental results show that with the same storage costs, the snapshot strategy proposed in this paper greatly improves the performance of recreating historical states, leading to at least 70.7% computation reduction in terms of the number of both redone and undone operations. Besides, with the same recreation performance guarantee, it brings nearly 78.9% storage reduction on average in terms of the number of snapshots.
first_indexed 2024-04-12T23:10:04Z
format Article
id doaj.art-feebf7b7c281461c87e7618cd4fb08db
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T23:10:04Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-feebf7b7c281461c87e7618cd4fb08db2022-12-22T03:12:49ZengIEEEIEEE Access2169-35362020-01-018908389084610.1109/ACCESS.2020.29942429091833An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical QueriesLuo Xiangyu0https://orcid.org/0000-0003-4033-0285Luo Yingxiao1Gui Xiaolin2https://orcid.org/0000-0003-4384-9891Yu Zhenhua3https://orcid.org/0000-0002-7204-3654College of Computer Science and Technology, Institute of Systems Security and Control, Xi’an University of Science and Technology, Xi’an, ChinaCollege of Computer Science and Technology, Institute of Systems Security and Control, Xi’an University of Science and Technology, Xi’an, ChinaSchool of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaCollege of Computer Science and Technology, Institute of Systems Security and Control, Xi’an University of Science and Technology, Xi’an, ChinaLarge-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called ‘snapshot plus log’. A snapshot records the whole data at a certain moment, while the log file is responsible for saving all the update operations. The historical state is then recreated from the nearest snapshot by redoing or undoing the related update operations saved in the log file. The challenge lies in how to minimize both the number of snapshots and that of redone and undone operations performed in historical state recreation. The traditional system stores snapshots at regular intervals. However, historical states do not share the same frequency of being requested. Therefore, the traditional strategy is very inefficient. This paper proposes a new strategy that determines the timestamps of the snapshots based on the distribution of the historical queries. First, the historical queries are clustered into a given number of groups according to the timestamps of the requested historical states, and the cluster centroids are calculated. Second, the snapshots are created according to the timestamps of the cluster centroids. Since the cluster centroids may change as time goes by, the above process is executed periodically. Experimental results show that with the same storage costs, the snapshot strategy proposed in this paper greatly improves the performance of recreating historical states, leading to at least 70.7% computation reduction in terms of the number of both redone and undone operations. Besides, with the same recreation performance guarantee, it brings nearly 78.9% storage reduction on average in terms of the number of snapshots.https://ieeexplore.ieee.org/document/9091833/Dynamic graphsnapshotlogclustering methodhistorical query
spellingShingle Luo Xiangyu
Luo Yingxiao
Gui Xiaolin
Yu Zhenhua
An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries
IEEE Access
Dynamic graph
snapshot
log
clustering method
historical query
title An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries
title_full An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries
title_fullStr An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries
title_full_unstemmed An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries
title_short An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries
title_sort efficient snapshot strategy for dynamic graph storage systems to support historical queries
topic Dynamic graph
snapshot
log
clustering method
historical query
url https://ieeexplore.ieee.org/document/9091833/
work_keys_str_mv AT luoxiangyu anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries
AT luoyingxiao anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries
AT guixiaolin anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries
AT yuzhenhua anefficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries
AT luoxiangyu efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries
AT luoyingxiao efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries
AT guixiaolin efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries
AT yuzhenhua efficientsnapshotstrategyfordynamicgraphstoragesystemstosupporthistoricalqueries