Booked [JK01] Git commit message analysis

Project overview

Writing good git commit messages is challenging because it requires discipline. Various authors have made suggestions as to what constitutes best practice when writing commit messages. However, there is no way to enforce these suggestions. Code inspections are a good opportunity to discuss commit messages, but they are likely to focus on other issues deemed more important. This project aims to develop a tool that is able to provide feedback to users about their commit messages, with a view to encourage them to improve their practice.

In this project, you will need to review the best practice recommendations on git commit messages. You will also need to review established natural language processing techniques that are relevant to your project. For example, a range of techniques and tools exist to analyse the grammatical structure of natural language strings, which you can use to assess messages.

You will then design a tool to assess messages, identifying those that meet the recommendations and those that violate them. Ideally, the tool can give constructive feedback to the user. The tool ought to be developed in Ruby, ideally incorporating a significant part of it in a Ruby gem. A good solution might be incorporated into Team Feedback to provide feedback to students. Your tool will need to be able to interface with GitHub to retrieve commit messages. GitHub provides an extensive API that enables developers to access and analyse git repositories stored on a GitHub server. Good Ruby gems are available to obtain access to the API.

Once developed, you ought to evaluate your tool. One approach would be to collate a data set of commit messages from public git repositories, assess the quality of these messages manually and compare your results with those produced by the tool. If you wish to take this approach, it would be a good idea to collect at least a partial data set early on in the project.

Initial background reading suggestions

If you wish to read a bit more about the kinds of natural language processing (NLP) techniques that this project involves, you might want to start with context free grammars. Robert Heckendorn's 2015 A practical tutorial on context free grammars is a good start. There are some NLP tools that you can use in this project to analyse the structure of text and to identify what the different parts of speech refer to. Stanford CoreNLP is a good example of this. GitHub comes with a REST API that is well documented. There are some good tutorials out there that explain how you can write a Ruby gem.

Who is this project for?

This project is suitable for both BSc and MSci students with an interest in Artificial Intelligence and Software Engineering. This project is not a research project as such, but it will require the student to learn about natural language processing. The software development work will involve a number of components: i.e. a Ruby (on Rails) gem that you design, access of other services (GitHub) via a OAUTH and a REST API and integration of other components. Developing a basic solution will be moderately challenging, because if you are unable to get any of these components working and integrated, you do not have a working solution. The project does provide clear scope to produce a good background literature review and an extensive in-depth evaluation of the work. No ethics permission should be required if you use public git repositories to extract commit histories from.

Questions about this project

Q: Can I discuss this project with you in person?

A: Yes, I have arrange a project Q&A session on Thursday 27 September, 18:00—20:00 in Bush House (SE) 2.12. All students in an undergraduate Computer Science programme doing an individual project this year are welcome. I'm afraid it is not feasible to meet all students with questions on a one-to-one basis this week. I am supervising 11 undergraduate projects this year and usually receive 5-6 meeting requests per project on average (because most students explore a range of different projects). As it is simply not feasible to schedule that many meetings in a single week, a group Q&A session seems to me to be the fairest approach to meet everyone.