Parallelization Strategies for Graph-Code-Based Similarity Search

The volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets....

Full description

Bibliographic Details
Main Authors:	Patrick Steinert, Stefan Wagenpfeil, Paul Mc Kevitt, Ingo Frommholz, Matthias Hemmje
Format:	Article
Language:	English
Published:	MDPI AG 2023-04-01
Series:	Big Data and Cognitive Computing
Subjects:	indexing retrieval explainability semantic multimedia feature graph
Online Access:	https://www.mdpi.com/2504-2289/7/2/70

_version_	1797596127664537600
author	Patrick Steinert Stefan Wagenpfeil Paul Mc Kevitt Ingo Frommholz Matthias Hemmje
author_facet	Patrick Steinert Stefan Wagenpfeil Paul Mc Kevitt Ingo Frommholz Matthias Hemmje
author_sort	Patrick Steinert
collection	DOAJ
description	The volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets. Machine learning can produce detailed semantic information on multimedia assets, reflected in a high volume of nodes and edges in the feature graphs. While increasing the effectiveness of the information retrieval results, the high level of detail and also the growing collections increase the processing time. Addressing this problem, Multimedia Feature Graphs (MMFGs) and Graph Codes (GCs) have been proven to be fast and effective structures for information retrieval. However, the huge volume of data requires more processing time. As Graph Code algorithms were designed to be parallelizable, different paths of parallelization can be employed to prove or evaluate the scalability options of Graph Code processing. These include horizontal and vertical scaling with the use of Graphic Processing Units (GPUs), Multicore Central Processing Units (CPUs), and distributed computing. In this paper, we show how different parallelization strategies based on Graph Codes can be combined to provide a significant improvement in efficiency. Our modeling work shows excellent scalability with a theoretical speedup of 16,711 on a top-of-the-line Nvidia H100 GPU with 16,896 cores. Our experiments with a mediocre GPU show that a speedup of 225 can be achieved and give credence to the theoretical speedup. Thus, Graph Codes provide fast and effective multimedia indexing and retrieval, even in billion-scale use cases.
first_indexed	2024-03-11T02:47:10Z
format	Article
id	doaj.art-d03d392e505648aaac1a972f597ebb73
institution	Directory Open Access Journal
issn	2504-2289
language	English
last_indexed	2024-03-11T02:47:10Z
publishDate	2023-04-01
publisher	MDPI AG
record_format	Article
series	Big Data and Cognitive Computing
spelling	doaj.art-d03d392e505648aaac1a972f597ebb732023-11-18T09:18:20ZengMDPI AGBig Data and Cognitive Computing2504-22892023-04-01727010.3390/bdcc7020070Parallelization Strategies for Graph-Code-Based Similarity SearchPatrick Steinert0Stefan Wagenpfeil1Paul Mc Kevitt2Ingo Frommholz3Matthias Hemmje4Faculty of Mathematics and Computer Science, University of Hagen, Universitätsstrasse 1, D-58097 Hagen, GermanyFaculty of Mathematics and Computer Science, University of Hagen, Universitätsstrasse 1, D-58097 Hagen, GermanyAcademy for International Science & Research (AISR), Derry BT48 7JL, UKSchool of Engineering, Computing and Mathematical Sciences, University of Wolverhampton, Wolverhampton WV1 1LY, UKFaculty of Mathematics and Computer Science, University of Hagen, Universitätsstrasse 1, D-58097 Hagen, GermanyThe volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets. Machine learning can produce detailed semantic information on multimedia assets, reflected in a high volume of nodes and edges in the feature graphs. While increasing the effectiveness of the information retrieval results, the high level of detail and also the growing collections increase the processing time. Addressing this problem, Multimedia Feature Graphs (MMFGs) and Graph Codes (GCs) have been proven to be fast and effective structures for information retrieval. However, the huge volume of data requires more processing time. As Graph Code algorithms were designed to be parallelizable, different paths of parallelization can be employed to prove or evaluate the scalability options of Graph Code processing. These include horizontal and vertical scaling with the use of Graphic Processing Units (GPUs), Multicore Central Processing Units (CPUs), and distributed computing. In this paper, we show how different parallelization strategies based on Graph Codes can be combined to provide a significant improvement in efficiency. Our modeling work shows excellent scalability with a theoretical speedup of 16,711 on a top-of-the-line Nvidia H100 GPU with 16,896 cores. Our experiments with a mediocre GPU show that a speedup of 225 can be achieved and give credence to the theoretical speedup. Thus, Graph Codes provide fast and effective multimedia indexing and retrieval, even in billion-scale use cases.https://www.mdpi.com/2504-2289/7/2/70indexingretrievalexplainabilitysemanticmultimediafeature graph
spellingShingle	Patrick Steinert Stefan Wagenpfeil Paul Mc Kevitt Ingo Frommholz Matthias Hemmje Parallelization Strategies for Graph-Code-Based Similarity Search Big Data and Cognitive Computing indexing retrieval explainability semantic multimedia feature graph
title	Parallelization Strategies for Graph-Code-Based Similarity Search
title_full	Parallelization Strategies for Graph-Code-Based Similarity Search
title_fullStr	Parallelization Strategies for Graph-Code-Based Similarity Search
title_full_unstemmed	Parallelization Strategies for Graph-Code-Based Similarity Search
title_short	Parallelization Strategies for Graph-Code-Based Similarity Search
title_sort	parallelization strategies for graph code based similarity search
topic	indexing retrieval explainability semantic multimedia feature graph
url	https://www.mdpi.com/2504-2289/7/2/70
work_keys_str_mv	AT patricksteinert parallelizationstrategiesforgraphcodebasedsimilaritysearch AT stefanwagenpfeil parallelizationstrategiesforgraphcodebasedsimilaritysearch AT paulmckevitt parallelizationstrategiesforgraphcodebasedsimilaritysearch AT ingofrommholz parallelizationstrategiesforgraphcodebasedsimilaritysearch AT matthiashemmje parallelizationstrategiesforgraphcodebasedsimilaritysearch

Parallelization Strategies for Graph-Code-Based Similarity Search

Similar Items