libFLASM: a software library for fixed-length approximate string matching

Ayad, Lorraine A.K.; Pissis, Solon P.; Retha, Ahmad

doi:10.1186/s12859-016-1320-2

by Lorraine A.K. Ayad, Solon P. Pissis, Ahmad Retha

Abstract:

Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time O ( m ⌈ ℓ / w ⌉ n ) \$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w {\backslash}rceil n){\$ and space O ( m ⌈ ℓ / w ⌉ ) \$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w{\backslash}rceil){\$ under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere.

View PDF

Reference:

libFLASM: a software library for fixed-length approximate string matching (Lorraine A.K. Ayad, Solon P. Pissis, Ahmad Retha), In BMC Bioinformatics, volume 17, 2016.

Bibtex Entry:

@Article{Ayad2016,
  author="Ayad, Lorraine A.K. and Pissis, Solon P. and Retha, Ahmad",
  title="libFLASM: a software library for fixed-length approximate string matching",
  journal="BMC Bioinformatics",
  year="2016",
  volume="17",
  number="1",
  pages="454",
  abstract="Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time                                                                                                                                 O                        (                        m                        ⌈                        ℓ                        /                        w                        ⌉                        n                        )                                                              {\$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w {\backslash}rceil n){\$}                   and space                                                                                                                                 O                        (                        m                        ⌈                        ℓ                        /                        w                        ⌉                        )                                                              {\$}{\backslash}mathcal {\{}O{\}}(m{\backslash}lceil {\backslash}ell /w{\backslash}rceil){\$}                   under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere.",
  issn="1471-2105",
  doi="10.1186/s12859-016-1320-2",
  url="http://dx.doi.org/10.1186/s12859-016-1320-2"
}