Up: Research Methods Previous: Tools for doing research

Subsections

Writing up and presenting your research

Scientific word processing: L^ATEX

Surprisingly important for how your project report looks in the end is what word processor you use to produce it. You may be used to writing essays and reports in Microsoft Word, but from past experience we very strongly advise against this. The problem is that Word works fine on ordinary text, but regularly screws up as soon as equations come in: in the past we've had students who had to correct all of the equations in their final printout by hand because all the symbols had got mixed up. So: if you do insist on using Word, be sure that you've checked that your document will print properly, in good time before the submission deadline. And don't expect too much sympathy if it all goes pear-shaped!

Much better, and highly recommended as the default option, is L^ATEX. This has many advantages:

It produces decent typesetting for mathematics
It will number and cross-reference equations, figures, sections automatically.
It will automatically produce a reference list in consistent formatting and number references automatically in order of occurrence, both in the reference list and in the text.

The downside is that L^ATEX is not quite ``what you see is what you get'': all the formatting is done via commands which are entered into an ordinary text file, and this file is then L^ATEXed to produce the document. This will take a week or so to get used to, but is well worth the effort.

We'll discuss the basic features of L^ATEX here, but there is much help available on the web, see e.g. here and here or search for ``latex help'' on google. The library also has books on L^ATEX.

The basic L^ATEX files which contain the ``program'' for your documents have the extension .tex. There are many editors available for such L^ATEX files, both free and commercial; see here. On student computers in the College, you may find WinEdt, which is shareware. You can try it out for free on your own PC for a month; after that a student licence costs $40. I'll discuss Texmaker here, which is free and available for all standard operating systems.

We'll be looking at the example file example.tex. Save this in your filespace and then open it from within Texmaker. You'll notice a range of icons in the command bar; click on ``Quick build'' by one of the blue arrows. This will generate (using a ``compiler'' also called L^ATEX, or an extended version like PDFLATeX) a file called example.dvi, which is the document produced from example.tex in a ``De Vice-Independent'' format, and then translate this to PDF format. I've set up Texmaker so that the PDF will show up alongside the L^ATEX file, but you can also have it in a separate window, or view the DVI file directly. The DVI and PDF files will be produced in the same directory where the original L^ATEX file was, and you can then e.g. print the PDF in the usual way.

Note that having the L^ATEX file and the PDF output next to each other means that you can click on one and get texmaker to jump to the corresponding part in the other. This makes it easy to go from the L^ATEX source to the document and vice versa.

Having successfully produced our first L^ATEX document, let's look in more detail at the commands in example.tex. The general structure of a latex document is:

\documentclass[options]{type of document}

(preamble: general settings and definitions of commands etc that you
want to use throughout)

\begin{document}

(main text, sectioning commands, equations etc)

\end{document}

You'll see that structure in example.tex; the documentclass is here report, which is appropriate for e.g. project dissertations because it allows for a titlepage and separation into chapters. Notice that all L^ATEX commands begin with a backslash, and that any arguments they take are in curly braces. The preamble of example.tex contains only two instances of \newcommand, a L^ATEX command that - unsurprisingly - defines a new command. In this case, \xv is defined so that wherever you type this in the rest of the file, it is replaced by {\bf x}. The \bf here stands for ``boldface'', so that \xv will (within the context of a mathematical expression) produce a boldface x, i.e. the symbol for the vector ${\bf x}$ . This type of command definition is very useful, especially for abbreviating lengthy command sequences, such as the one for \sigmav, which generates a boldface $\mbox{\boldmath$\sigma$}$ (and is a little more complicated because Greek characters are by default non-bold).

The first commands in the document itself should be fairly self-explanatory and generate the title page. Note that I put a \bf into the argument of the \title command because I wanted the title to come out in bold. I also abused the \date field to give some further information, and forced it to be moved down a bit with the \vspace*{3cm} command. Try removing this command to see what happens. The \\ forces a line break. On the next page, the abstract is enclosed in a \begin{abstract} ...\end{abstract} pair.

Cross-referencing

In the remainder of the file, you'll see that each chapter, section and subsection is started with an appropriate command. Directly after many of these you'll see a \label command; this associates this label with the number of the chapter or section. You can recall this label later with \ref{ ...}; you'll see a few instances of this in example.tex. L^ATEX automatically does the numbering for you, so if you ``cross-reference'' chapters and sections in this way you don't have to renumber anything by hand even if you reorder, insert or delete sections.

The same idea also applies to the numbering of equations. You tell L^ATEX that you want a numbered equation with a \begin{equation} ...\end{equation} pair. If you put a \label command anywhere within this pair, you can then refer to the equation number again by \ref, as illustrated in the file. (You'll notice that I've put ~'s in front of some of the \ref commands; the ~ produces a space but doesn't allow a linebreak, and so prevents an equation or section number from appearing at the beginning of a new line.)

Unnumbered equations are enclosed in \[ ...\]; if you accidentally put a \label command into such an equation it will produce strange results because the \label has got no equation number to refer to. Generally, it's therefore easiest to use numbered equations throughout. When you want to refer to mathematical symbols in the text, enclose them in $ ...$; this makes sure they're typeset consistently with the equations.

A bit more on cross-referencing: you'll notice that the first time you run L^ATEX on a .tex file, the resulting document will look ``funny'': all the cross-references to equation and section numbers are replaced by question marks. This is because L^ATEX uses the first ``run'' to figure out which numbers to allocate, and records these in an auxiliary file called in our case example.aux; you should be able to locate this file in the directory where you saved example.tex. When L^ATEX is run a second time, it reads in this information and inserts the numbers in the right place: you'll see that the document now looks much more sensible. The literature references are still question marks, though; we'll come back to that.

Maths

Let's have a look at some maths commands now, e.g. in the equation labelled eq:SVM_min. Simple equations are typed just like you'd expect them to, e.g. (a + b)d = c produces . Subscripts and superscripts are produced with ^ and _, so a^2 + b^2 = c^2 gives Pythagoras' theorem . Most of the symbols for special functions are L^ATEX commands; you'll see \min in the file, while \exp(xy) gives the exponential $\exp(xy)$ . Round and square brackets work as usual; curly braces are got by \{ and \} (because curly braces without backslashes are used for L^ATEX command arguments). To get brackets to match the size of the symbols they enclose, use e.g. \left( ...\right) for round brackets.

There are also L^ATEX commands for many other mathematical symbols and operators. E.g. the \cdot in example.tex gives a ``centre dot'' which is used for scalar products; \sum produces a summation sign for which subscripts and superscripts can be used to indicate the summation range. (Texmaker has a list of the most important mathematical symbols to click on, which will produce the corresponding L^ATEX command in the file you're editing.) By the way, the \mbox command encloses some ordinary text to stop it from being typeset in the italics used for maths symbols; \quad produces a double space (\ followed by a space as shown produces a single space; bigger spaces can be got by \qquad or by stringing several spacing commands together).

Bibliography

With these comments you should be able to understand most of example.tex. The only feature not explained yet concerns the citation commands. \cite produces the numbers for one or several references, enclosed in the standard square brackets, not dissimilar from the action of \ref (though the latter only takes one argument at a time, and produces no brackets by itself). The labels which \cite refers to are defined by \bibitem commands. But there aren't any \bibitem commands in example.tex, so how is this meant to work?

The answer is that you can get the \bibitem entries generated automatically from a bibliography ``database'', using the command bibtex; Texmaker's commands for this are in the ``Bibliography'' menu. Bibtex gets its basic information from the \bibliographystyle{unsrt} and \bibliography{refs} commands in example.tex. The former tells bibtex what kind of bibliography format to produce; unsrt stands for ``unsorted'', where entries are numbered and arranged by order of appearance in the main text, and is the conventional choice. The latter tells bibtex the name of the database file; in the example, \bibliography{refs} means that the database is refs.bib in the same directory as the L^ATEX file (save it there). Note that bibtex databases all have the extension .bib, but that the \bibliography command must not contain this extension part of the filename.

To understand how bibtex databases work, consider a sample entry from refs.bib:

@book{Vapnik95,
author    = {Vapnik, V},
title     = {The nature of statistical learning theory},
address   = {New York},
publisher = {Springer},
year      = {1995}
}

The @book tells bibtex what kind of reference it is looking at. The first argument is the citation label; this is what's referred to in the \cite command (e.g. \cite{Vapnik95}). After that, you've got a list of the attributes describing the item, in the form attribute = {...}; alternatively you can use attribute = "...". Attributes are separated by commas, and finally there's a closing } matching the opening one. The meaning of the various attributes should be self-explanatory; address is the place of publication (if you're not sure, it's the first place listed in the copyright statement in a book; that's also where you get the year of publication from). You can add comments to items using the note attribute, e.g. note = {Available online...}.

Other common types of item are @article and @inproceedings for journal articles and papers published in conference proceedings or as book chapters. refs.bib has examples of both to illustrate the various attributes required. You can also add attributes for your own information, which bibtex will ignore; e.g. in refs.bib you'll see that several articles have an abstract attribute which is just for my own benefit when I'm trying to find articles in the database. Or you could add, when you enter each article into your .bib file, something like comment = {Overview of SVMs; quote in chapter on background material} to remind yourself what the article was about and where you're thinking of quoting it.

To get bibtex to do its work, just click on the appropriate button in Texmaker (or press F11). This will create a file called example.bbl (bbl for ``bibliography''), containing a list of \bibitem commands with the information from refs.bib extracted and formatted appropriately, and arranged in order of appearance in example.tex. You'll now need to run L^ATEX twice: the first time it reads in the .bbl file and allocates the reference numbers; the second time these are then actually put into the text wherever \cite commands appear.

An alternative way of creating your bibliography is to write the \bibitem commands yourself. However, it's very easy to miss out information this way or get the formatting muddled up; scientific writing has fairly strict conventions on how reference lists have to be set out (see below). Also, the order of the items may need to change as you re-organize the text (unless you go for an alphabetical ordering by name of first author; by the way, the bibtex command for this is \bibliographystyle{alpha}). Bibtex does all this work for you, and it's just as easy to type information into a .bib file as it is to type out the \bibitem commands. Use of bibtex is therefore highly recommended.

Reference managers

There are many reference managers available for L^ATEX, which all use bibtex in some form. A popular free one is JabRef, which is a Java application and so runs on most computers. This lets you tag, sort and search references, and you can add links to where the full text documents are online (using DOI) or in your own filespace. It works directly on existing .bib files and you can add custom fields to your bibliography items, e.g. for comments. There are also online providers that will manage your bibtex reference file for you, a recent one being Mendeley. Google e.g. ``latex reference manager'' for further information.

Error messages

Inevitably, the first few times you use L^ATEX you'll get error messages when you run L^ATEX on your .tex file, especially when you're dealing with complicated equations. These can be a little cryptic, but you'll quickly learn how to read them. The most common errors are forgetting to close a pair of {...}, or similarly forgetting to close a $...$ for maths within the text; this then causes L^ATEX to typeset all the text up until the next maths section as maths, and to complain that it's missing a $ before the next maths section. Because of this, the cause of the error can often be earlier in the .tex file than at the point the error message refers to. If you can't figure out where the error is, a simple technique is to put an \end{document} just before the problematic section; L^ATEX will then ignore the rest of the file. Moving this command lower and repeatedly running L^ATEX you should be able to pinpoint where the error lies.

Figures

You'll most likely want to include some figures in your report. An extended version of L^ATEX called PDFLATeX, which generates PDF output as the name suggests, can include standard graphics formats such as jpg and PDF files. With standard L^ATEX, you would need to use figure files in encapsulated postscript (.eps) format. You can also use these in PDFLATeX if you put \usepackage[update,prepend]{epstopdf} in your preamble which will make PDFLATeX convert the files to PDF automatically; or convert the files to PDF by hand using e.g. epstopdf. Postscript is a widely used printer language, so you sometimes get such output if you choose ``print to file''. The ``encapsulated'' bit just means that the file contains information on its size (the ``bounding box'') but this can be added by hand if necessary - ask your supervisor or your fellow students if in doubt. Here's a L^ATEX snippet that would include a figure:

\documentclass[a4paper]{report}
\usepackage{graphicx}
...
\begin{figure}
\begin{center}
\includegraphics[width=12cm]{dooda.eps}
\end{center}
\caption{Here goes the figure caption.
\label{fig:dooda}
}
\end{figure}

(Note that if your figure was in pdf format and called dooda.pdf, you would just replace the filename appropriately and the snippet above would then work in PDFLATeX.) The \usepackage{graphicx} command in the preamble tells L^ATEX to load an extra package of commands called graphicx. The figure itself is enclosed in \begin{figure}...\end{figure} as you might have expected. The .eps file is placed with the \includegraphics command, which specifies the name of the file and here also the width to which it is to be scaled; a \begin{center} ...\end{center} makes sure it's centred between the left and right margins. The caption command does the obvious thing; enclosed(!) within the caption is a \label command so that you can refer to the figure number elsewhere using \ref{fig:dooda}. L^ATEX will ``float'' the figure to some appropriate place in the document, near the position in the text where you've put the \begin{figure}...\end{figure} commands. Various options for controlling this process exist; see the L^ATEX help for details.

Guidelines for project reports and outlines

Below are some brief suggestions on how to structure project reports. Please adhere to the word limit for your project, which relates to all words in the main text, captions, headings and footnotes. Depending on number of figures etc, 1,000 words correspond to around 3-7 pages, so accounting for title, abstract and references a 10,000 word report would be expected to be 35-75 pages in length, a 5,000 word report 20-40.

Set the margins (see e.g. here) to between Latex's default margins at the upper end, and 2cm all the way round at the lower end. Use 11pt font.

Reports need to contain: title page, abstract, main text, list of references. A table of contents before the main text is also helpful. L^ATEX can produce this automatically (see example.tex above).
Level of detail: the report should contain enough detail for a reader familiar with the material taught in your programme to understand what you have done. If necessary, material that you feel is too technical for the main text can be included in the form of appendices.
The abstract should be one or two paragraphs long; it should state the problem which the project addresses and the main conclusions and results.
A possible skeleton structure for the main text is (your supervisor will advise on the relative balance of the various sections):
- Introduction: This should explain the context of the project and situate it in the broader field of your programme. It should state clearly which problem the project was designed to tackle and with what methodology, and it should give an overview of the structure of the remainder of the report.
- Review: This should discuss related work in the area of the project, explaining the differences between the various approaches and the one chosen for the project. If the projects uses methods or theoretical techniques not covered in the course these should also be reviewed.
- Methodology and results: This will generally consist of several sections. Where appropriate, results should be summarized in figures or tables.
- Discussion and conclusions: This should be a summary and critical evaluation of what the project has achieved and how the results relate to ones obtained by other people. You should also discuss what could have been done differently and how the approach could be improved or developed further in future work.
Figures: need to be large enough to be decipherable; pay attention to sizes of symbols, error bars, visibility of different linestyles etc. Labelling (axis labels, tick labels along axes, legends etc) needs to be large enough to read, i.e. of the same size as the main text. Give units of data where appropriate.
All figures and tables need to be numbered and have a self-contained caption, i.e. one that explains what is shown without requiring the reader to go back to the text, including - if the data shown are not your own - the source. All figures and tables need to be referred to and discussed at the appropriate place in the main text.
The list of references should be formatted consistently and contain all the information which a reader would need to retrieve the items referred to. E.g. for an article from a journal you would normally list, in order: Authors, title of article, title of journal, volume number, page numbers, year of publication. E.g. ``A. N. Other and B. Someone, Gnus versus gnats, Journal of Gnu-ology, 23:110-113, 2002.''; using bibtex (s.a.) takes most of the hassle out of the formatting. For books also list the publisher; for edited volumes such as conference proceedings also the editors. (Again, bibtex will helpfully remind you if you've not given this information in the .bib file.) References to web sites should be used sparingly, since URLs tend to have a rather short half-life; they should mention at least the author(s), title, and give the full URL. A typical reference list would contain no less than 5-10 items, and no more than 30 unless the project is predominantly of review character. All references need to be referred to in the main text.
Attribution of others' work: whenever you discuss other work, the appropriate reference should be given unless it is clear from the context. This applies even in, for example, an introductory section. There are of course circumstances where this would be redundant; if a section is devoted to a review of a particular paper or set of papers, then it is sufficient to state this at the beginning of the section.
It is essential that any actual quotes from other people's work are identified as such, i.e. you need to say explicitly that you're quoting, and give the reference.
More relevant in practice is the case where you are including a discussion in your report that follows quite closely a particular reference. In that case you really need to put the source reference away and formulate the ideas in your own words. It is generally not acceptable to copy sentences from your source and just modify them here and there. (You can do this very occasionally, if you say explicitly that you are paraphrasing, or indeed quoting, and indicate from which source.) Unacknowledged and sustained paraphrasing will be regarded as plagiarism, with potentially serious consequences.
Project submissions may be checked by examiners using the Turnitin software if there is a suspicion of plagiarism. You will have the possibility of submitting your draft report to Turnitin yourself, from the project webpage on KEATS, once this facility is activated. You are encouraged to use this resource to help you avoid inadvertent plagiarism.
For more detailed information on referencing and attribution of others' work, you should also consult the guidelines available from Library Services for more detailed information on referencing and attribution of others' work. The formatting instructions there are largely redundant if you use L^ATEX, but the guidance on when and how to cite is very relevant and useful.

A project outline can be structured along similar lines, although it will of course be very much shorter (typically a couple of pages, no more) and not contain any results. I've made a sample latex file outline.tex which you can download. The file also contains the snippet shown above to include a figure; download the figure (.eps) file dooda.eps from here before L^ATEXing this, or the pdf version here.

Some thoughts on ``critical evaluation''

Let's finish with some ideas on what we mean by ``critical evaluation and discussion''. This is a very important part of research, and not something that you'll have necessarily learnt about as an undergraduate.

Let's say you've just run your favourite statistical learning algorithm on the first real data set from whatever complex system you are looking at, have tested it and it seems to make reasonable predictions. You think ``Groovy - project sorted.'' But in fact, just showing that something ``works'' isn't really science. We need to know e.g. why it works, or indeed why it doesn't. So a ``critical evaluation'' of results could include

A comparison with what other people have done. When does your method work better, when worse, and can you explain why? How do the methods relate in terms of how complicated they are (e.g. in terms of programming effort or computer time)?
A comparison with a ``baseline''. Try to apply a very basic approach to your problem (e.g. a simple and well-known learning algorithm, to stick with the statistical learning example). What result do you get? Is your (presumably more complicated) method significantly better, i.e. is it worth the effort?
If your work involved any kind of ``measurement'', e.g. via numerical experiments: a determination of the nature of the possible errors in your results (systematic or statistical), and their magnitude (error bars).
An analysis of how various features of your problem affect the results. For example, if you are using a learning algorithm it will no doubt have parameters that you need to set. How did you decide which values to use? Have you tried to vary them; what happens and do you understand why? Of major importance is the amount of data: what happens if you vary the size of the training set, for example?
How specific are the results to what you've tried? E.g. would your approach apply to other situations? Would it be practical for, say, much larger data sets or data containing more noise?

The key point is that knowing how well your approach works is not the only important issue. Of at least equal importance is that you analyse why it works the way it does, what affects its performance and how it compares to other approaches. Also, if you are quantifying performance etc. in some way, you ought to analyse the significance of your results, usually judged in terms of error bars on them. So even if the approach you're using doesn't work terribly well, you can end up with a perfectly respectable project as long as you perform an intelligent analysis of why it fails, and of what could be done to improve matters.

The above suggestions apply also to more theoretical projects, with appropriate modifications. You could ask for example

How do your results compare with those of others? Are they consistent, and if not, why not (were different approximations used, or are numerical errors significant)?
How do your methods compare with others? Are they more generally applicable, or more restricted? Are they more intuitively understandable? If you've had to make approximations, when do you expect them to be valid (maybe in some appropriate limit)?
Which features are important for your method to work? Could it be generalized to other situations, and what modifications would it need?

Up: Research Methods Previous: Tools for doing research

Sollich 2017-11-23