Weak factor automata: the failure of failure factor oracles?

In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failur...

Full description

Bibliographic Details
Main Authors: Loek Cleophas, Derrick G. Kourie, Bruce W. Watson
Format: Article
Language:English
Published: South African Institute of Computer Scientists and Information Technologists 2014-08-01
Series:South African Computer Journal
Subjects:
Online Access:http://sacj.cs.uct.ac.za/index.php/sacj/article/view/199
_version_ 1818438815301238784
author Loek Cleophas
Derrick G. Kourie
Bruce W. Watson
author_facet Loek Cleophas
Derrick G. Kourie
Bruce W. Watson
author_sort Loek Cleophas
collection DOAJ
description In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.
first_indexed 2024-12-14T17:46:34Z
format Article
id doaj.art-99ad3d8441744274bd6ef6ceca131c1b
institution Directory Open Access Journal
issn 1015-7999
2313-7835
language English
last_indexed 2024-12-14T17:46:34Z
publishDate 2014-08-01
publisher South African Institute of Computer Scientists and Information Technologists
record_format Article
series South African Computer Journal
spelling doaj.art-99ad3d8441744274bd6ef6ceca131c1b2022-12-21T22:52:44ZengSouth African Institute of Computer Scientists and Information TechnologistsSouth African Computer Journal1015-79992313-78352014-08-0105395Weak factor automata: the failure of failure factor oracles?Loek CleophasDerrick G. KourieBruce W. WatsonIn indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.http://sacj.cs.uct.ac.za/index.php/sacj/article/view/199AlgorithmicsDictionaryPattern MatchingDNA Sequences
spellingShingle Loek Cleophas
Derrick G. Kourie
Bruce W. Watson
Weak factor automata: the failure of failure factor oracles?
South African Computer Journal
Algorithmics
Dictionary
Pattern Matching
DNA Sequences
title Weak factor automata: the failure of failure factor oracles?
title_full Weak factor automata: the failure of failure factor oracles?
title_fullStr Weak factor automata: the failure of failure factor oracles?
title_full_unstemmed Weak factor automata: the failure of failure factor oracles?
title_short Weak factor automata: the failure of failure factor oracles?
title_sort weak factor automata the failure of failure factor oracles
topic Algorithmics
Dictionary
Pattern Matching
DNA Sequences
url http://sacj.cs.uct.ac.za/index.php/sacj/article/view/199
work_keys_str_mv AT loekcleophas weakfactorautomatathefailureoffailurefactororacles
AT derrickgkourie weakfactorautomatathefailureoffailurefactororacles
AT brucewwatson weakfactorautomatathefailureoffailurefactororacles