Subpath Queries on Compressed Graphs: A Survey

Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text <i>T</i>, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in <i>T</i> in time proportional to t...

Full description

Bibliographic Details
Main Author:	Nicola Prezza
Format:	Article
Language:	English
Published:	MDPI AG 2021-01-01
Series:	Algorithms
Subjects:	indexing compressed data structures labeled graphs
Online Access:	https://www.mdpi.com/1999-4893/14/1/14

_version_	1797542255871918080
author	Nicola Prezza
author_facet	Nicola Prezza
author_sort	Nicola Prezza
collection	DOAJ
description	Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text <i>T</i>, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in <i>T</i> in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages.
first_indexed	2024-03-10T13:28:05Z
format	Article
id	doaj.art-753c296dfa8542c0a09d35e11e787dfa
institution	Directory Open Access Journal
issn	1999-4893
language	English
last_indexed	2024-03-10T13:28:05Z
publishDate	2021-01-01
publisher	MDPI AG
record_format	Article
series	Algorithms
spelling	doaj.art-753c296dfa8542c0a09d35e11e787dfa2023-11-21T08:35:45ZengMDPI AGAlgorithms1999-48932021-01-011411410.3390/a14010014Subpath Queries on Compressed Graphs: A SurveyNicola Prezza0Department of Environmental Sciences, Informatics and Statistics, Ca’ Foscari University, Dorsoduro, 3246, 30123 Venice, ItalyText indexing is a classical algorithmic problem that has been studied for over four decades: given a text <i>T</i>, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in <i>T</i> in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages.https://www.mdpi.com/1999-4893/14/1/14indexingcompressed data structureslabeled graphs
spellingShingle	Nicola Prezza Subpath Queries on Compressed Graphs: A Survey Algorithms indexing compressed data structures labeled graphs
title	Subpath Queries on Compressed Graphs: A Survey
title_full	Subpath Queries on Compressed Graphs: A Survey
title_fullStr	Subpath Queries on Compressed Graphs: A Survey
title_full_unstemmed	Subpath Queries on Compressed Graphs: A Survey
title_short	Subpath Queries on Compressed Graphs: A Survey
title_sort	subpath queries on compressed graphs a survey
topic	indexing compressed data structures labeled graphs
url	https://www.mdpi.com/1999-4893/14/1/14
work_keys_str_mv	AT nicolaprezza subpathqueriesoncompressedgraphsasurvey

Subpath Queries on Compressed Graphs: A Survey

Similar Items