Surprisingly important for how your project report looks in the end is what word processor you use to produce it. You may be used to writing essays and reports in Microsoft Word, but from past experience we very strongly advise against this. The problem is that Word works fine on ordinary text, but regularly screws up as soon as equations come in: in the past we've had students who had to correct all of the equations in their final printout by hand because all the symbols had got mixed up. So: if you do insist on using Word, be sure that you've checked that your document will print properly, in good time before the submission deadline. And don't expect too much sympathy if it all goes pear-shaped!
Much better, and highly recommended as the default option, is LATEX. This has many advantages:
We'll discuss the basic features of LATEX here, but there is much help available on the web, see e.g. here and here or search for ``latex help'' on google. The library also has books on LATEX.
The basic LATEX files which contain the ``program'' for your documents have the extension .tex. There are many editors available for such LATEX files, both free and commercial; see here. On student computers in the College, you may find WinEdt, which is shareware. You can try it out for free on your own PC for a month; after that a student licence costs $40. I'll discuss Texmaker here, which is free and available for all standard operating systems.
We'll be looking at the example file example.tex. Save this in your filespace and then open it from within Texmaker. You'll notice a range of icons in the command bar; click on ``Quick build'' by one of the blue arrows. This will generate (using a ``compiler'' also called LATEX, or an extended version like PDFLATeX) a file called example.dvi, which is the document produced from example.tex in a ``De Vice-Independent'' format, and then translate this to PDF format. I've set up Texmaker so that the PDF will show up alongside the LATEX file, but you can also have it in a separate window, or view the DVI file directly. The DVI and PDF files will be produced in the same directory where the original LATEX file was, and you can then e.g. print the PDF in the usual way.
Note that having the LATEX file and the PDF output next to each other means that you can click on one and get texmaker to jump to the corresponding part in the other. This makes it easy to go from the LATEX source to the document and vice versa.
Having successfully produced our first LATEX document, let's look in more detail at the commands in example.tex. The general structure of a latex document is:
\documentclass[options]{type of document} (preamble: general settings and definitions of commands etc that you want to use throughout) \begin{document} (main text, sectioning commands, equations etc) \end{document}
You'll see that structure in example.tex; the documentclass is
here report, which is appropriate for e.g. project
dissertations because it allows for a titlepage and separation into
chapters. Notice that all LATEX commands begin with a backslash, and
that any arguments they take are in curly braces. The preamble of
example.tex contains only two instances of \newcommand
, a
LATEX command that - unsurprisingly - defines a new command. In this
case, \xv
is defined so that wherever you type this in the rest
of the file, it is replaced by {\bf x}
. The \bf
here
stands for ``boldface'', so that \xv
will (within the context
of a mathematical expression) produce a boldface x, i.e. the symbol
for the vector . This type of command definition is very
useful, especially for abbreviating lengthy command sequences, such as
the one for
\sigmav
, which generates a boldface
(and is a little more complicated because
Greek characters are by default non-bold).
The first commands in the document itself should be fairly
self-explanatory and generate the title page. Note that I put a \bf
into the argument of the \title
command because I wanted the
title to come out in bold. I also abused the \date
field to give
some further information, and forced it to be moved down a bit with
the \vspace*{3cm}
command. Try removing this command to
see what happens. The \\
forces a line break. On
the next page, the abstract is enclosed in a \begin{abstract}
...\end{abstract}
pair.
In the remainder of the file, you'll see that each chapter, section
and subsection is started with an appropriate command. Directly after
many of these you'll see a \label
command; this
associates this label with the number of the chapter or section. You
can recall this label later with \ref{
...}
; you'll
see a few instances of this in example.tex. LATEX automatically
does the numbering for you, so if you ``cross-reference'' chapters and
sections in this way you don't have to renumber anything by hand even
if you reorder, insert or delete sections.
The same idea also applies to the numbering of equations. You tell
LATEX that you want a numbered equation with a \begin{equation}
...\end{equation}
pair. If you put a \label
command
anywhere within this pair, you can then refer to the equation number
again by \ref
, as illustrated in the file. (You'll notice that
I've put ~
's in front of some of the \ref
commands; the
~
produces a space but doesn't allow a linebreak, and so
prevents an equation or section number from appearing at the beginning
of a new line.)
Unnumbered equations are enclosed in \[
...\]
; if
you accidentally put a \label
command into such an equation it
will produce strange results because the \label
has got no
equation number to refer to. Generally, it's therefore easiest to use
numbered equations throughout. When you want to refer to mathematical
symbols in the text, enclose them in $
...$
; this
makes sure they're typeset consistently with the equations.
A bit more on cross-referencing: you'll notice that the first time you run LATEX on a .tex file, the resulting document will look ``funny'': all the cross-references to equation and section numbers are replaced by question marks. This is because LATEX uses the first ``run'' to figure out which numbers to allocate, and records these in an auxiliary file called in our case example.aux; you should be able to locate this file in the directory where you saved example.tex. When LATEX is run a second time, it reads in this information and inserts the numbers in the right place: you'll see that the document now looks much more sensible. The literature references are still question marks, though; we'll come back to that.
Let's have a look at some maths commands now, e.g. in the equation
labelled eq:SVM_min
. Simple equations are typed just like you'd
expect them to, e.g. (a + b)d = c
produces .
Subscripts and superscripts are produced with
^
and _
,
so a^2 + b^2 = c^2
gives Pythagoras' theorem
. Most of the symbols for special functions are LATEX
commands; you'll see
\min
in the file, while \exp(xy)
gives the exponential . Round and square brackets work as
usual; curly braces are got by
\{
and \}
(because curly
braces without backslashes are used for LATEX command arguments). To
get brackets to match the size of the symbols they enclose, use e.g.
\left(
...\right)
for round brackets.
There are also LATEX commands for many other mathematical symbols and
operators. E.g. the \cdot
in example.tex gives a ``centre
dot'' which is used for scalar products; \sum
produces a
summation sign for which subscripts and superscripts can be used to
indicate the summation range. (Texmaker has a list of the most important
mathematical symbols to click on, which will produce the corresponding
LATEX command in the file you're editing.) By the way, the \mbox
command encloses some ordinary text to stop it from being typeset in
the italics used for maths symbols; \quad
produces a double
space (\
followed by a space as shown produces a single
space; bigger spaces can be got by \qquad
or by stringing
several spacing commands together).
With these comments you should be able to understand most of
example.tex. The only feature not explained yet concerns the citation
commands. \cite
produces the numbers for one or several
references, enclosed in the standard square brackets, not dissimilar
from the action of \ref
(though the latter only takes one
argument at a time, and produces no brackets by itself). The labels
which \cite
refers to are defined by \bibitem
commands. But there aren't any \bibitem
commands in
example.tex, so how is this meant to work?
The answer is that you can get the \bibitem
entries generated
automatically from a bibliography ``database'', using the command
bibtex; Texmaker's commands for this are in the ``Bibliography'' menu. Bibtex gets its basic
information from the \bibliographystyle{unsrt}
and
\bibliography{refs}
commands in example.tex. The former
tells bibtex what kind of bibliography format to produce; unsrt
stands for ``unsorted'', where entries are numbered and arranged by
order of appearance in the main text, and is the conventional
choice. The latter tells bibtex the name of the database file; in the
example, \bibliography{refs}
means that the database is
refs.bib in the same directory as the LATEX file
(save it
there). Note that bibtex databases all have
the extension .bib, but that the \bibliography
command
must not contain this extension part of the filename.
To understand how bibtex databases work, consider a sample entry from refs.bib:
@book{Vapnik95, author = {Vapnik, V}, title = {The nature of statistical learning theory}, address = {New York}, publisher = {Springer}, year = {1995} }The
@book
tells bibtex what kind of reference it is looking
at. The first argument is the citation label; this is what's referred
to in the \cite
command (e.g. \cite{Vapnik95}
). After
that, you've got a list of the attributes describing the item, in the
form attribute = {
...}
; alternatively you can use
attribute = "
..."
. Attributes are separated by
commas, and finally there's a closing }
matching the opening
one. The meaning of the various attributes should be self-explanatory;
address
is the place of publication (if you're not sure, it's
the first place listed in the copyright statement in a book; that's
also where you get the year of publication from). You can add comments
to items using the note
attribute, e.g.
note = {Available online
...}
.
Other common types of item are @article
and
@inproceedings
for journal articles and papers published in
conference proceedings or as book chapters. refs.bib has
examples of both to illustrate the various attributes required. You
can also add attributes for your own information, which bibtex will
ignore; e.g. in refs.bib you'll see that several articles have
an abstract
attribute which is just for my own benefit when I'm
trying to find articles in the database. Or you could add, when you
enter each article into your .bib file, something like
comment = {Overview of SVMs; quote in chapter on background material}
to remind yourself what the article was about and where you're
thinking of quoting it.
To get bibtex to do its work, just click on the appropriate button in
Texmaker (or press F11). This will create a file called
example.bbl (bbl
for ``bibliography''), containing a list of \bibitem
commands
with the information from refs.bib extracted and formatted
appropriately, and arranged in order of appearance in
example.tex. You'll now need to run LATEX twice: the first time
it reads in the .bbl file and allocates the reference numbers;
the second time these are then actually put into the text wherever
\cite
commands appear.
An alternative way of creating your bibliography is to write the
\bibitem
commands yourself. However, it's very easy to miss out
information this way or get the formatting muddled up; scientific
writing has fairly strict conventions on how reference lists have to
be set out (see below). Also, the order of the items may need to
change as you re-organize the text (unless you go for an alphabetical
ordering by name of first author; by the way, the bibtex command for
this is \bibliographystyle{alpha}
). Bibtex does all this work
for you, and it's just as easy to type information into a .bib
file as it is to type out the \bibitem
commands. Use of bibtex
is therefore highly recommended.
There are many reference managers available for LATEX, which all use bibtex in some form. A popular free one is JabRef, which is a Java application and so runs on most computers. This lets you tag, sort and search references, and you can add links to where the full text documents are online (using DOI) or in your own filespace. It works directly on existing .bib files and you can add custom fields to your bibliography items, e.g. for comments. There are also online providers that will manage your bibtex reference file for you, a recent one being Mendeley. Google e.g. ``latex reference manager'' for further information.
Inevitably, the first few times you use LATEX you'll get error messages
when you run LATEX on your .tex file, especially when you're
dealing with complicated equations. These can be a little cryptic, but
you'll quickly learn how to read them. The most common errors are
forgetting to close a pair of {
...}
, or similarly
forgetting to close a $
...$
for maths within the
text; this then causes LATEX to typeset all the text up until the next
maths section as maths, and to complain that it's missing a $
before the next maths section. Because of this, the cause of the error
can often be earlier in the .tex file than at the point the
error message refers to. If you can't figure out where the error is, a
simple technique is to put an \end{document}
just before the
problematic section; LATEX will then ignore the rest of the
file. Moving this command lower and repeatedly running LATEX you should
be able to pinpoint where the error lies.
You'll most likely want to include some figures in your report. An
extended version of LATEX called PDFLATeX, which generates PDF output
as the name suggests, can include standard graphics formats such as
jpg and PDF files. With standard LATEX, you would need to use figure
files in encapsulated postscript (.eps) format. You can
also use these in PDFLATeX if you put
\usepackage[update,prepend]{epstopdf}
in your preamble which
will make PDFLATeX convert the files to PDF automatically; or convert
the files to PDF by hand using e.g. epstopdf
. Postscript is a
widely used printer language, so you sometimes get such output if you
choose ``print to file''. The ``encapsulated'' bit just means that the
file contains information on its size (the ``bounding box'') but this
can be added by hand if necessary - ask your supervisor or your fellow students if in doubt. Here's a LATEX snippet
that would include a figure:
\documentclass[a4paper]{report} \usepackage{graphicx} ... \begin{figure} \begin{center} \includegraphics[width=12cm]{dooda.eps} \end{center} \caption{Here goes the figure caption. \label{fig:dooda} } \end{figure}(Note that if your figure was in pdf format and called dooda.pdf, you would just replace the filename appropriately and the snippet above would then work in PDFLATeX.) The
\usepackage{graphicx}
command in the preamble tells LATEX to
load an extra package of commands called graphicx. The figure itself is
enclosed in \begin{figure}
...\end{figure}
as you
might have expected. The .eps file is placed with the
\includegraphics
command, which specifies the name of the file and here
also the width to which it is to be scaled; a
\begin{center}
...\end{center}
makes sure it's centred
between the left and right margins. The caption command does the
obvious thing; enclosed(!) within the caption is a \label
command so that you can refer to the figure number elsewhere using
\ref{fig:dooda}
. LATEX will ``float'' the figure to some
appropriate place in the document, near the position in the text where
you've put the \begin{figure}
...\end{figure}
commands. Various options for controlling this process exist; see the
LATEX help for details.
Below are some brief suggestions on how to structure project reports. Please adhere to the word limit for your project, which relates to all words in the main text, captions, headings and footnotes. Depending on number of figures etc, 1,000 words correspond to around 3-7 pages, so accounting for title, abstract and references a 10,000 word report would be expected to be 35-75 pages in length, a 5,000 word report 20-40.
Set the margins (see e.g. here) to between Latex's default margins at the upper end, and 2cm all the way round at the lower end. Use 11pt font.
It is essential that any actual quotes from other people's work are identified as such, i.e. you need to say explicitly that you're quoting, and give the reference.
More relevant in practice is the case where you are including a discussion in your report that follows quite closely a particular reference. In that case you really need to put the source reference away and formulate the ideas in your own words. It is generally not acceptable to copy sentences from your source and just modify them here and there. (You can do this very occasionally, if you say explicitly that you are paraphrasing, or indeed quoting, and indicate from which source.) Unacknowledged and sustained paraphrasing will be regarded as plagiarism, with potentially serious consequences.
Project submissions may be checked by examiners using the Turnitin software if there is a suspicion of plagiarism. You will have the possibility of submitting your draft report to Turnitin yourself, from the project webpage on KEATS, once this facility is activated. You are encouraged to use this resource to help you avoid inadvertent plagiarism.
For more detailed information on referencing and attribution of others' work, you should also consult the guidelines available from Library Services for more detailed information on referencing and attribution of others' work. The formatting instructions there are largely redundant if you use LATEX, but the guidance on when and how to cite is very relevant and useful.
A project outline can be structured along similar lines, although it will of course be very much shorter (typically a couple of pages, no more) and not contain any results. I've made a sample latex file outline.tex which you can download. The file also contains the snippet shown above to include a figure; download the figure (.eps) file dooda.eps from here before LATEXing this, or the pdf version here.
Let's finish with some ideas on what we mean by ``critical evaluation and discussion''. This is a very important part of research, and not something that you'll have necessarily learnt about as an undergraduate.
Let's say you've just run your favourite statistical learning algorithm on the first real data set from whatever complex system you are looking at, have tested it and it seems to make reasonable predictions. You think ``Groovy - project sorted.'' But in fact, just showing that something ``works'' isn't really science. We need to know e.g. why it works, or indeed why it doesn't. So a ``critical evaluation'' of results could include
The key point is that knowing how well your approach works is not the only important issue. Of at least equal importance is that you analyse why it works the way it does, what affects its performance and how it compares to other approaches. Also, if you are quantifying performance etc. in some way, you ought to analyse the significance of your results, usually judged in terms of error bars on them. So even if the approach you're using doesn't work terribly well, you can end up with a perfectly respectable project as long as you perform an intelligent analysis of why it fails, and of what could be done to improve matters.
The above suggestions apply also to more theoretical projects, with appropriate modifications. You could ask for example