Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis

Designing a high-capacity cache is an essential means of improving the accessibility of cloud storage. Compared with traditional data access, cloud storage data access presents new patterns, and traditional caching strategies cannot handle the prefetching and replacement of non-hot data very well. N...

Full description

Bibliographic Details
Main Authors:	Fang Xiao, Siyuan Yu, Yuze Li
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Cache strategy cloud storage file correlation hit rate machine learning prefetching
Online Access:	https://ieeexplore.ieee.org/document/10273997/

_version_	1797659038556618752
author	Fang Xiao Siyuan Yu Yuze Li
author_facet	Fang Xiao Siyuan Yu Yuze Li
author_sort	Fang Xiao
collection	DOAJ
description	Designing a high-capacity cache is an essential means of improving the accessibility of cloud storage. Compared with traditional data access, cloud storage data access presents new patterns, and traditional caching strategies cannot handle the prefetching and replacement of non-hot data very well. Numerous studies have shown that file correlation can optimize cloud storage’s caching and prefetching strategies. However, characterizing the correlation between files from multiple dimensions is quite complex, and the difficulty of optimizing cloud storage caching using file correlation increases accordingly. Based on the above shortcomings, this study designed a file similarity strategy based on skip-gram from the analysis of user access. This strategy completes the prefetching and replacing files in a high-capacity cache by judging the correlation between files. The strategy prefetches files and dynamically inserts them into the cache by judging the correlation between files. After using the prefetching strategy, we significantly improve the cache hit rate in the simulation benchmark. In addition, the strategy can establish an index table after each training completion, which consumes very little time during online operations. During training, the time required to establish the index is <inline-formula> <tex-math notation="LaTeX">$O(N*log(V))$ </tex-math></inline-formula>, and the time for indexing is <inline-formula> <tex-math notation="LaTeX">$O(1)$ </tex-math></inline-formula>.
first_indexed	2024-03-11T18:09:09Z
format	Article
id	doaj.art-4d26c0bf1e5b47f29ff139810aa2afe2
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-11T18:09:09Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-4d26c0bf1e5b47f29ff139810aa2afe22023-10-16T23:00:39ZengIEEEIEEE Access2169-35362023-01-011111126511127310.1109/ACCESS.2023.332272510273997Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation AnalysisFang Xiao0https://orcid.org/0000-0002-7337-6987Siyuan Yu1https://orcid.org/0009-0008-9926-4968Yuze Li2https://orcid.org/0009-0009-0195-2963Digital Technology Department, Library, Huazhong University of Science and Technology, Wuhan, ChinaSchool of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, ChinaSchool of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, ChinaDesigning a high-capacity cache is an essential means of improving the accessibility of cloud storage. Compared with traditional data access, cloud storage data access presents new patterns, and traditional caching strategies cannot handle the prefetching and replacement of non-hot data very well. Numerous studies have shown that file correlation can optimize cloud storage’s caching and prefetching strategies. However, characterizing the correlation between files from multiple dimensions is quite complex, and the difficulty of optimizing cloud storage caching using file correlation increases accordingly. Based on the above shortcomings, this study designed a file similarity strategy based on skip-gram from the analysis of user access. This strategy completes the prefetching and replacing files in a high-capacity cache by judging the correlation between files. The strategy prefetches files and dynamically inserts them into the cache by judging the correlation between files. After using the prefetching strategy, we significantly improve the cache hit rate in the simulation benchmark. In addition, the strategy can establish an index table after each training completion, which consumes very little time during online operations. During training, the time required to establish the index is <inline-formula> <tex-math notation="LaTeX">$O(N*log(V))$ </tex-math></inline-formula>, and the time for indexing is <inline-formula> <tex-math notation="LaTeX">$O(1)$ </tex-math></inline-formula>.https://ieeexplore.ieee.org/document/10273997/Cache strategycloud storagefile correlationhit ratemachine learningprefetching
spellingShingle	Fang Xiao Siyuan Yu Yuze Li Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis IEEE Access Cache strategy cloud storage file correlation hit rate machine learning prefetching
title	Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis
title_full	Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis
title_fullStr	Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis
title_full_unstemmed	Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis
title_short	Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis
title_sort	efficient large capacity caching in cloud storage using skip gram based file correlation analysis
topic	Cache strategy cloud storage file correlation hit rate machine learning prefetching
url	https://ieeexplore.ieee.org/document/10273997/
work_keys_str_mv	AT fangxiao efficientlargecapacitycachingincloudstorageusingskipgrambasedfilecorrelationanalysis AT siyuanyu efficientlargecapacitycachingincloudstorageusingskipgrambasedfilecorrelationanalysis AT yuzeli efficientlargecapacitycachingincloudstorageusingskipgrambasedfilecorrelationanalysis

Efficient Large-Capacity Caching in Cloud Storage Using Skip-Gram-Based File Correlation Analysis

Similar Items