next_inactive up previous
Up: Research Methods Previous: Tools for doing research

Subsections


Writing up and presenting your research

Scientific word processing: LATEX

Surprisingly important for how your project report looks in the end is what word processor you use to produce it. You may be used to writing essays and reports in Microsoft Word, but from past experience we very strongly advise against this. The problem is that Word works fine on ordinary text, but regularly screws up as soon as equations come in: in the past we've had students who had to correct all of the equations in their final printout by hand because all the symbols had got mixed up. So: if you do insist on using Word, be sure that you've checked that your document will print properly, in good time before the submission deadline. And don't expect too much sympathy if it all goes pear-shaped!

Much better, and highly recommended as the default option, is LATEX. This has many advantages:

The downside is that LATEX is not quite ``what you see is what you get'': all the formatting is done via commands which are entered into an ordinary text file, and this file is then LATEXed to produce the document. This will take a week or so to get used to, but is well worth the effort.

We'll discuss the basic features of LATEX here, but there is much help available on the web, see e.g. here and here or search for ``latex help'' on google. The library also has books on LATEX.

The basic LATEX files which contain the ``program'' for your documents have the extension .tex. There are many editors available for such LATEX files, both free and commercial; see here. On student computers in the College, you may find WinEdt, which is shareware. You can try it out for free on your own PC for a month; after that a student licence costs $40. I'll discuss Texmaker here, which is free and available for all standard operating systems.

We'll be looking at the example file example.tex. Save this in your filespace and then open it from within Texmaker. You'll notice a range of icons in the command bar; click on ``Quick build'' by one of the blue arrows. This will generate (using a ``compiler'' also called LATEX, or an extended version like PDFLATeX) a file called example.dvi, which is the document produced from example.tex in a ``De Vice-Independent'' format, and then translate this to PDF format. I've set up Texmaker so that the PDF will show up alongside the LATEX file, but you can also have it in a separate window, or view the DVI file directly. The DVI and PDF files will be produced in the same directory where the original LATEX file was, and you can then e.g. print the PDF in the usual way.

Note that having the LATEX file and the PDF output next to each other means that you can click on one and get texmaker to jump to the corresponding part in the other. This makes it easy to go from the LATEX source to the document and vice versa.

Having successfully produced our first LATEX document, let's look in more detail at the commands in example.tex. The general structure of a latex document is:

\documentclass[options]{type of document}

(preamble: general settings and definitions of commands etc that you
want to use throughout)

\begin{document}

(main text, sectioning commands, equations etc)

\end{document}

You'll see that structure in example.tex; the documentclass is here report, which is appropriate for e.g. project dissertations because it allows for a titlepage and separation into chapters. Notice that all LATEX commands begin with a backslash, and that any arguments they take are in curly braces. The preamble of example.tex contains only two instances of \newcommand, a LATEX command that - unsurprisingly - defines a new command. In this case, \xv is defined so that wherever you type this in the rest of the file, it is replaced by {\bf x}. The \bf here stands for ``boldface'', so that \xv will (within the context of a mathematical expression) produce a boldface x, i.e. the symbol for the vector ${\bf x}$. This type of command definition is very useful, especially for abbreviating lengthy command sequences, such as the one for \sigmav, which generates a boldface $\mbox{\boldmath$\sigma$}$ (and is a little more complicated because Greek characters are by default non-bold).

The first commands in the document itself should be fairly self-explanatory and generate the title page. Note that I put a \bf into the argument of the \title command because I wanted the title to come out in bold. I also abused the \date field to give some further information, and forced it to be moved down a bit with the \vspace*{3cm} command. Try removing this command to see what happens. The \\ forces a line break. On the next page, the abstract is enclosed in a \begin{abstract} ...\end{abstract} pair.

Cross-referencing

In the remainder of the file, you'll see that each chapter, section and subsection is started with an appropriate command. Directly after many of these you'll see a \label command; this associates this label with the number of the chapter or section. You can recall this label later with \ref{ ...}; you'll see a few instances of this in example.tex. LATEX automatically does the numbering for you, so if you ``cross-reference'' chapters and sections in this way you don't have to renumber anything by hand even if you reorder, insert or delete sections.

The same idea also applies to the numbering of equations. You tell LATEX that you want a numbered equation with a \begin{equation} ...\end{equation} pair. If you put a \label command anywhere within this pair, you can then refer to the equation number again by \ref, as illustrated in the file. (You'll notice that I've put ~'s in front of some of the \ref commands; the ~ produces a space but doesn't allow a linebreak, and so prevents an equation or section number from appearing at the beginning of a new line.)

Unnumbered equations are enclosed in \[ ...\]; if you accidentally put a \label command into such an equation it will produce strange results because the \label has got no equation number to refer to. Generally, it's therefore easiest to use numbered equations throughout. When you want to refer to mathematical symbols in the text, enclose them in $ ...$; this makes sure they're typeset consistently with the equations.

A bit more on cross-referencing: you'll notice that the first time you run LATEX on a .tex file, the resulting document will look ``funny'': all the cross-references to equation and section numbers are replaced by question marks. This is because LATEX uses the first ``run'' to figure out which numbers to allocate, and records these in an auxiliary file called in our case example.aux; you should be able to locate this file in the directory where you saved example.tex. When LATEX is run a second time, it reads in this information and inserts the numbers in the right place: you'll see that the document now looks much more sensible. The literature references are still question marks, though; we'll come back to that.

Maths

Let's have a look at some maths commands now, e.g. in the equation labelled eq:SVM_min. Simple equations are typed just like you'd expect them to, e.g. (a + b)d = c produces $(a+b)d=c$. Subscripts and superscripts are produced with ^ and _, so a^2 + b^2 = c^2 gives Pythagoras' theorem $a^2+b^2=c^2$. Most of the symbols for special functions are LATEX commands; you'll see \min in the file, while \exp(xy) gives the exponential $\exp(xy)$. Round and square brackets work as usual; curly braces are got by \{ and \} (because curly braces without backslashes are used for LATEX command arguments). To get brackets to match the size of the symbols they enclose, use e.g. \left( ...\right) for round brackets.

There are also LATEX commands for many other mathematical symbols and operators. E.g. the \cdot in example.tex gives a ``centre dot'' which is used for scalar products; \sum produces a summation sign for which subscripts and superscripts can be used to indicate the summation range. (Texmaker has a list of the most important mathematical symbols to click on, which will produce the corresponding LATEX command in the file you're editing.) By the way, the \mbox command encloses some ordinary text to stop it from being typeset in the italics used for maths symbols; \quad produces a double space (\ followed by a space as shown produces a single space; bigger spaces can be got by \qquad or by stringing several spacing commands together).

Bibliography

With these comments you should be able to understand most of example.tex. The only feature not explained yet concerns the citation commands. \cite produces the numbers for one or several references, enclosed in the standard square brackets, not dissimilar from the action of \ref (though the latter only takes one argument at a time, and produces no brackets by itself). The labels which \cite refers to are defined by \bibitem commands. But there aren't any \bibitem commands in example.tex, so how is this meant to work?

The answer is that you can get the \bibitem entries generated automatically from a bibliography ``database'', using the command bibtex; Texmaker's commands for this are in the ``Bibliography'' menu. Bibtex gets its basic information from the \bibliographystyle{unsrt} and \bibliography{refs} commands in example.tex. The former tells bibtex what kind of bibliography format to produce; unsrt stands for ``unsorted'', where entries are numbered and arranged by order of appearance in the main text, and is the conventional choice. The latter tells bibtex the name of the database file; in the example, \bibliography{refs} means that the database is refs.bib in the same directory as the LATEX file (save it there). Note that bibtex databases all have the extension .bib, but that the \bibliography command must not contain this extension part of the filename.

To understand how bibtex databases work, consider a sample entry from refs.bib:

@book{Vapnik95,
author    = {Vapnik, V},
title     = {The nature of statistical learning theory},
address   = {New York},
publisher = {Springer},
year      = {1995}
}
The @book tells bibtex what kind of reference it is looking at. The first argument is the citation label; this is what's referred to in the \cite command (e.g. \cite{Vapnik95}). After that, you've got a list of the attributes describing the item, in the form attribute = {...}; alternatively you can use attribute = "...". Attributes are separated by commas, and finally there's a closing } matching the opening one. The meaning of the various attributes should be self-explanatory; address is the place of publication (if you're not sure, it's the first place listed in the copyright statement in a book; that's also where you get the year of publication from). You can add comments to items using the note attribute, e.g. note = {Available online...}.

Other common types of item are @article and @inproceedings for journal articles and papers published in conference proceedings or as book chapters. refs.bib has examples of both to illustrate the various attributes required. You can also add attributes for your own information, which bibtex will ignore; e.g. in refs.bib you'll see that several articles have an abstract attribute which is just for my own benefit when I'm trying to find articles in the database. Or you could add, when you enter each article into your .bib file, something like comment = {Overview of SVMs; quote in chapter on background material} to remind yourself what the article was about and where you're thinking of quoting it.

To get bibtex to do its work, just click on the appropriate button in Texmaker (or press F11). This will create a file called example.bbl (bbl for ``bibliography''), containing a list of \bibitem commands with the information from refs.bib extracted and formatted appropriately, and arranged in order of appearance in example.tex. You'll now need to run LATEX twice: the first time it reads in the .bbl file and allocates the reference numbers; the second time these are then actually put into the text wherever \cite commands appear.

An alternative way of creating your bibliography is to write the \bibitem commands yourself. However, it's very easy to miss out information this way or get the formatting muddled up; scientific writing has fairly strict conventions on how reference lists have to be set out (see below). Also, the order of the items may need to change as you re-organize the text (unless you go for an alphabetical ordering by name of first author; by the way, the bibtex command for this is \bibliographystyle{alpha}). Bibtex does all this work for you, and it's just as easy to type information into a .bib file as it is to type out the \bibitem commands. Use of bibtex is therefore highly recommended.

Reference managers

There are many reference managers available for LATEX, which all use bibtex in some form. A popular free one is JabRef, which is a Java application and so runs on most computers. This lets you tag, sort and search references, and you can add links to where the full text documents are online (using DOI) or in your own filespace. It works directly on existing .bib files and you can add custom fields to your bibliography items, e.g. for comments. There are also online providers that will manage your bibtex reference file for you, a recent one being Mendeley. Google e.g. ``latex reference manager'' for further information.

Error messages

Inevitably, the first few times you use LATEX you'll get error messages when you run LATEX on your .tex file, especially when you're dealing with complicated equations. These can be a little cryptic, but you'll quickly learn how to read them. The most common errors are forgetting to close a pair of {...}, or similarly forgetting to close a $...$ for maths within the text; this then causes LATEX to typeset all the text up until the next maths section as maths, and to complain that it's missing a $ before the next maths section. Because of this, the cause of the error can often be earlier in the .tex file than at the point the error message refers to. If you can't figure out where the error is, a simple technique is to put an \end{document} just before the problematic section; LATEX will then ignore the rest of the file. Moving this command lower and repeatedly running LATEX you should be able to pinpoint where the error lies.

Figures

You'll most likely want to include some figures in your report. An extended version of LATEX called PDFLATeX, which generates PDF output as the name suggests, can include standard graphics formats such as jpg and PDF files. With standard LATEX, you would need to use figure files in encapsulated postscript (.eps) format. You can also use these in PDFLATeX if you put \usepackage[update,prepend]{epstopdf} in your preamble which will make PDFLATeX convert the files to PDF automatically; or convert the files to PDF by hand using e.g. epstopdf. Postscript is a widely used printer language, so you sometimes get such output if you choose ``print to file''. The ``encapsulated'' bit just means that the file contains information on its size (the ``bounding box'') but this can be added by hand if necessary - ask your supervisor or your fellow students if in doubt. Here's a LATEX snippet that would include a figure:

\documentclass[a4paper]{report}
\usepackage{graphicx}
...
\begin{figure}
\begin{center}
\includegraphics[width=12cm]{dooda.eps}
\end{center}
\caption{Here goes the figure caption.
\label{fig:dooda}
}
\end{figure}
(Note that if your figure was in pdf format and called dooda.pdf, you would just replace the filename appropriately and the snippet above would then work in PDFLATeX.) The \usepackage{graphicx} command in the preamble tells LATEX to load an extra package of commands called graphicx. The figure itself is enclosed in \begin{figure}...\end{figure} as you might have expected. The .eps file is placed with the \includegraphics command, which specifies the name of the file and here also the width to which it is to be scaled; a \begin{center} ...\end{center} makes sure it's centred between the left and right margins. The caption command does the obvious thing; enclosed(!) within the caption is a \label command so that you can refer to the figure number elsewhere using \ref{fig:dooda}. LATEX will ``float'' the figure to some appropriate place in the document, near the position in the text where you've put the \begin{figure}...\end{figure} commands. Various options for controlling this process exist; see the LATEX help for details.

Guidelines for project reports and outlines

Below are some brief suggestions on how to structure project reports. Please adhere to the word limit for your project, which relates to all words in the main text, captions, headings and footnotes. Depending on number of figures etc, 1,000 words correspond to around 3-7 pages, so accounting for title, abstract and references a 10,000 word report would be expected to be 35-75 pages in length, a 5,000 word report 20-40.

Set the margins (see e.g. here) to between Latex's default margins at the upper end, and 2cm all the way round at the lower end. Use 11pt font.

A project outline can be structured along similar lines, although it will of course be very much shorter (typically a couple of pages, no more) and not contain any results. I've made a sample latex file outline.tex which you can download. The file also contains the snippet shown above to include a figure; download the figure (.eps) file dooda.eps from here before LATEXing this, or the pdf version here.

Some thoughts on ``critical evaluation''

Let's finish with some ideas on what we mean by ``critical evaluation and discussion''. This is a very important part of research, and not something that you'll have necessarily learnt about as an undergraduate.

Let's say you've just run your favourite statistical learning algorithm on the first real data set from whatever complex system you are looking at, have tested it and it seems to make reasonable predictions. You think ``Groovy - project sorted.'' But in fact, just showing that something ``works'' isn't really science. We need to know e.g. why it works, or indeed why it doesn't. So a ``critical evaluation'' of results could include

The key point is that knowing how well your approach works is not the only important issue. Of at least equal importance is that you analyse why it works the way it does, what affects its performance and how it compares to other approaches. Also, if you are quantifying performance etc. in some way, you ought to analyse the significance of your results, usually judged in terms of error bars on them. So even if the approach you're using doesn't work terribly well, you can end up with a perfectly respectable project as long as you perform an intelligent analysis of why it fails, and of what could be done to improve matters.

The above suggestions apply also to more theoretical projects, with appropriate modifications. You could ask for example


next_inactive up previous
Up: Research Methods Previous: Tools for doing research
Sollich 2017-11-23