-
[CU1] Automata Minimisation
Description:
Deterministic finite automata
have many uses in Computer Science, for example for lexing
program code. In order to improve their run-time, automata need to be minimised, that
is transformed into equivalent automata with the smallest possible number of state
nodes.
There is a little known method for minimising deterministic finite
automata by
Janusz Brzozowski.
This method first reverses the edges of an automaton, which produces
a potentially non-deterministic automaton. The non-deterministic automaton is
then determinised using the usual powerset construction. This is repeated
once more and voila you obtain a minimised version of the automaton
you started with. It is rather surprising that this method works at all:
the powerset construction might produce an automaton with an exponentially
larger number of states, completely contrary to the idea of minimising the
number of states. The task of this project is to implement this method, check that
it actually works with some examples and
compare it with more traditional methods for automata minimisation
(in terms of run-time, code complexity, etc). Examples can be
obtained by translating regular expressions into automata. A natural extension
of the project is therefore to implement a recogniser for regular expressions
following, for example, this paper.
Literature:
A good place to start with this project are the wikipedia articles
here and
here.
The authoritative book
on automata is by John Hopcroft and Jeffrey Ullmann (available in the library).
There is also an online course about automata by Ullman at
Coursera, though IMHO not
done with love. There
is also the book Automata and Computability by
Dexter Kozen including more
advanced material about automata.
Finally, there are millions of other pointers about automata
minimisation on the web. Finally, I will give a lot of the background in
my Automata and Formal Languages course (6CCS3AFL), which starts in September.
Skills:
This is a project for a student with an interest in theory and some
reasonable programming skills. The project can be easily implemented
in languages like
Scala,
ML,
Haskell,
Python, etc.
-
[CU2] Equivalence Checking of Regular Expressions
Description:
Solving the problem of deciding the equivalence of regular expressions can be used
to decide a number of problems in automated reasoning. Recently,
Andreas Asperti
proposed a simple method for deciding regular expression equivalence described
here.
The task is to implement this method and test it on examples.
It would be also interesting to see whether Asperti's method applies to
extended regular expressions, described
here.
Literature:
The central literature is obviously the papers
here and
here.
Asperti has also some slides here.
More references about regular expressions can be found
here. Like in
[CU1], I will give a lot of the background pf this project in
my Automata and Formal Languages course (6CCS3AFL).
Skills:
This is a project for a student with a passion for theory and some
reasonable programming skills. The project can be easily implemented
in languages like Scala
Scala,
ML,
Haskell,
Python, etc.
Being able to read Haskell
code is beneficial for the part involving extended regular expressions.
-
[CU3] Machine Code Generation for a Simple Compiler
Description:
Compilers translate high-level programs that humans can read and write into
efficient machine code that can be run on a CPU or virtual machine.
I recently implemented a very simple compiler for a very simple functional
programming language following this
paper
(also described here).
My code, written in Scala, of this compiler is
here.
The compiler can deal with simple programs involving natural numbers, such
as Fibonacci numbers
or factorial (but it can be easily extended - that is not the point).
While the hard work has been done (understanding the two papers above),
my compiler only produces some idealised machine code. For example I
assume there are infinitely many registers. The goal of this
project is to generate machine code that is more realistic and can
run on a CPU, like x86, or run on a virtual machine, say the JVM.
This gives probably a speedup of thousand times in comparison to
my naive machine code and virtual machine. The project
requires to dig into the literature about real CPUs and generating
real machine code.
Literature:
There is a lot of literature about compilers
(for example this book -
I can lend you my copy for the duration of the project). A very good overview article
about implementing compilers by
Laurie Tratt is
here.
An introduction into x86 machine code is here.
Intel's official manual for the x86 instruction is
here.
A simple assembler for the JVM is described here.
An interesting twist of this project is to not generate code for a CPU, but
for the intermediate language of the LLVM compiler
(also described here and
here). If you want to see
what machine code looks like you can compile your C-program using gcc -S.
Skills:
This is a project for a student with a deep interest in programming languages and
compilers. Since my compiler is implemented in Scala,
it would make sense to continue this project in this language. I can be
of help with questions and books about Scala.
But if Scala is a problem, my code can also be translated quickly into any other functional
language.
-
[CU4] Implementation of Register Spilling Algorithms
Description:
This project is similar to [CU3]. The emphasis here, however, is on the
implementation and comparison of register spilling algorithms, also often called register allocation
algorithms. They are part of any respectable compiler. As said
in [CU3], however, my simple compiler lacks them and assumes an infinite amount of registers instead.
Real CPUs however only provide a fixed amount of registers (for example
x86-64 has 16 general purpose registers). Whenever a program needs
to hold more values than registers, the values need to be “spilled”
into the main memory. Register spilling algorithms try to minimise
this spilling, since fetching values from main memory is a costly
operation.
The classic algorithm for register spilling uses a
graph-colouring method.
However, for some time the LLVM compiler
used a supposedly more efficient method, called the linear scan allocation method
(described
here).
However, it was later decided to abandon this method in favour of
a
greedy register allocation method. It would be nice if this project can find out
what the issues are with these methods and implement at least one of them for the
simple compiler referenced in [CU3].
Literature:
The graph colouring method is described in Andrew Appel's
book on compilers
(I can give you my copy of this book, if it is not available in the library).
There is also a survey
article
about register allocation algorithms with further pointers.
Skills:
Same skills as [CU3].
-
[CU5] A Student Polling System
Description:
One of the more annoying aspects of giving a lecture is to ask a question
to the students and no matter how easy the questions is to not
receive an answer. Recently, the online course system
Udacity made an art out of
asking questions during lectures (see for example the
Web Application Engineering
course CS253).
The lecturer there gives multiple-choice questions as part of the lecture and the students need to
click on the appropriate answer. This works very well in the online world.
For “real-world” lectures, the department has some
clickers
(these are little devices part of an audience response systems). However,
they are a logistic nightmare for the lecturer: they need to be distributed
during the lecture and collected at the end. Nowadays, where students
come with their own laptop or smartphone to lectures, this can
be improved.
The task of this project is to implement an online student
polling system. The lecturer should be able to prepare
questions beforehand (encoded as some web-form) and be able to
show them during the lecture. The students
can give their answers by clicking on the corresponding webpage.
The lecturer can then collect the responses online and evaluate them
immediately. Such a system is sometimes called
HTML voting.
There are a number of commercial
solutions for this problem, but they are not easy to use (in addition
to being ridiculously expensive). A good student can easily improve upon
what they provide.
The problem of student polling is not as hard as
electronic voting,
which essentially is still an unsolved problem in Computer Science. The
students only need to be prevented from answering question more than once thus skewing
any statistics. Unlike electronic voting, no audit trail needs to be kept
for student polling. Restricting the number of answers can probably be solved
by setting appropriate cookies on the students
computers or smart phones.
Literature:
The project requires fluency in a web-programming language (for example
Javascript,
PHP,
Java, Python,
Go,
Scala,
Ruby)
and possibly a cloud application platform (for example
Google App Engine or
Heroku).
For web-programming the
Web Application Engineering
course at Udacity is a good starting point
to be aware of the issues involved. This course uses Python.
To evaluate the answers from the student, Google's
Chart Tools
might be useful, which ar also described in this
youtube video.
Skills:
In order to provide convenience for the lecturer, this project needs very good web-programming skills. A
hacker mentality
(see above) is probably very beneficial: web-programming is an area that only emerged recently and
many tools still lack maturity. You probably have to experiment a lot with several different
languages and tools.
-
[CU6] Implementation of a Distributed Clock-Synchronisation Algorithm developed at NASA
Description:
There are many algorithms for synchronising clocks. This
paper
describes a new algorithm for clocks that communicate by exchanging
messages and thereby reach a state in which (within some bound) all clocks are synchronised.
A slightly longer and more detailed paper about the algorithm is
here.
The point of this project is to implement this algorithm and simulate networks of clocks.
Literature:
There is a wide range of literature on clock syncronisation algorithms.
Some pointers are given in this
paper,
which describes the algorithm to be implemented in this project. Pointers
are given also here.
Skills:
In order to implement a simulation of a network of clocks, you need to tackle
concurrency. You can do this for example in the programming language
Scala with the help of the
Akka library. This library enables you to send messages
between different actors. Here
are some examples that explain how to implement exchanging messages between actors.