by Lorraine A.K. Ayad, Solon P. Pissis, Ahmad Retha
Abstract:
Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time O ( m ⌈ ℓ / w ⌉ n ) \$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w {\backslash}rceil n){\$ and space O ( m ⌈ ℓ / w ⌉ ) \$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w{\backslash}rceil){\$ under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere.
Reference:
libFLASM: a software library for fixed-length approximate string matching (Lorraine A.K. Ayad, Solon P. Pissis, Ahmad Retha), In BMC Bioinformatics, volume 17, 2016.
Bibtex Entry:
@Article{Ayad2016,
author="Ayad, Lorraine A.K. and Pissis, Solon P. and Retha, Ahmad",
title="libFLASM: a software library for fixed-length approximate string matching",
journal="BMC Bioinformatics",
year="2016",
volume="17",
number="1",
pages="454",
abstract="Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time O ( m ⌈ ℓ / w ⌉ n ) {\$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w {\backslash}rceil n){\$} and space O ( m ⌈ ℓ / w ⌉ ) {\$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w{\backslash}rceil){\$} under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere.",
issn="1471-2105",
doi="10.1186/s12859-016-1320-2",
url="http://dx.doi.org/10.1186/s12859-016-1320-2"
}