MSc Projects 2018/2019

LAKA01: Computing the longest common periodic factor in linear time.

A period of x[0..|x|-1] is a positive integer p such that x[i]=x[i+p] holds for all 0 <= i < |x|-p. The smallest period of x is denoted by per(x). String u is called periodic if and only if per(u) <= |u|/2. Given a set of n strings, we want to find in linear time, the longest periodic factor common to all n strings.

Implementation: C/C++


LAKA02: Computing the longest common square-free factor in linear time.

A string of the form uu is called a square. A square-free string is a string that does not contain a square as a factor. Given a pair of strings x and y, we want to find in linear time, the longest square-free factor that is common to both x and y.

Implementation: C/C++


LAKA03: Computing the longest common palindromic factor in linear time.

The reverse of a string x is denoted by x^R. A string p is said to be a palindrome if and only if p=p^R. Given a pair of strings x and y, we want to find in linear time, the longest palindromic factor that is common to both x and y.

Implementation: C/C++


LAKA04: Linear space representation of palindromes in degenerate strings.

The reverse of a string x is denoted by x^R. A string p is said to be a palindrome if and only if p=p^R. A degenerate string is a sequence of n sets of strings, where the ith set contains strings of the same length, but this length can vary between different sets. We want to represent all palindromes found in a degenerate string in linear space.

Implementation: C/C++


LAKA05: Generalised suffix tree interface with pattern matching.

A suffix tree of a non-empty string x is a compact trie representing all suffixes of x. A generalised suffix tree is a compact trie representing all suffixes of n strings. We would like to have an interactive user interface that shows the step by step construction of a generalised suffix tree. Also given a pattern y as input, using the suffix tree, the implementation should identify whether y can be found in strings 0...n-1 and its location.

Implementation: Java/C/C++


LAKA06: Creating Improvisations on Chord Progressions using Suffix Trees.

A suffix tree of a non-empty string x is a compact trie representing all suffixes of x. A transposition p^t is a musical function which allows a chord to be modified such that the root of the chord can be replaced by another. We want to be able to compute in linear time, an improvisation on string x with respect to a dictionary D, where an improvisation is a sequence x_1,...,x_\ell of strings, such that: (i) every x_i is either a longest factor of x found in D (associated with musical data and allowing the same transposition) or a letter of x (with no musical data - a silent chord); (ii) x=x_1,...,x_\ell; (iii) \ell is minimal.

Implementation: C/C++


LAKA07: Identifying relations between circular sequences given a circular sequence database.

A circular sequence of length m can be viewed as a traditional linear sequence which has the left- and right-most letters wrapped around and positioned next to each other. Under this notion, the same circular sequence can be seen as m different linear sequences, which would all be considered equivalent. Given a database D of n circular sequences and a circular sequence x, we want to compute the relation between x and D_0,...,D_n-1 to identify the circular sequence which is the most similar to x.

Implementation: C/C++


LAKA08: Own Project

Students can suggest their own projects regarding string processing algorithms. These can include algorithms for sequence comparisons, pattern matching, automatons or related areas which can have applications in molecular biology, image recognition or musical analysis..

Implementation: Java/C/C++