MD Trajectories in PyMOL: No Memory Limits

The length and size of modern MD trajectories can present problems when it comes to storing, analysing and especially viewing your simulations. In regards to the latter, one common solution is to create a second copy of a trajectory with some frames skipped or with the solvent removed. This is time-consuming, memory-consuming and now….totally unnecessary.

As PyMOL Fellows, we have been working on embedding the MDAnalysis python package into PyMOL to allow it to handle the parsing of molecular dynamics trajectories. Here, we present our very first prototype, which is able to instantly load and render molecular dynamics trajectories of any length whilst having a RAM footprint approximately the size of a single frame.

The video below shows a system with approximately 90,000 coarse-grained beads being loaded into PyMOL, and then a trajectory of 20,000 frames loaded instantly.

It’s really as simple as that: load your system and the trajectory using the regular PyMOL commands, prepended with mda_.

mda_load system.pdb
mda_load_traj trajectory.xtc

We have modified PyMOL such that MDAnalysis will automatically be called to load the trajectory and send the atomic coordinates back to PyMOL for rendering. A trajectory slider will show up at the bottom of the window, allowing you to browse through your trajectory without ever worrying about the memory usage. Not only does this reduce memory consumption, but it removes the need to wait for the trajectory to be loaded.

Try It Yourself

Warning: This is very much a work in progress

If you are interested in the code or in running it yourself you can access the project on GitHub. The compilation and installation process is described on the PyMOL Wiki. With this version of PyMOL, it is currently possible to load any trajectory that MDAnalysis can handle and then perform some analysis or take snapshots of the trajectory. However, as we are still in the process of embedding MDAnalysis into PyMOL, some of the PyMOL functionalities will not work when the trajectory is loaded with mda_load_traj.

Our Motivation

PyMOL is a molecular visualisation tool that is widely used in the fields of biochemistry and molecular dynamics. However, at the time of its inception it was not designed to work with the the length of trajectories and amount of data that are now commonplace. This issue is not unique to PyMOL but is also true for most (if not all) similar tools.

By default, PyMOL stores all trajectory information in RAM and pre-renders images for all frames. It is possible to turn off this pre-rendering, which is done with the PyMOL command “defer_builds_mode, 3“. We’ve run small tests using “defer_builds_mode, 3” (rather than the default of 0) and found memory usage reduced from 5280 MB to 1472 MB for a simulation with an xtc trajectory of around 225 MB. This is a significant improvement, but still twice more than MDAnalysis and VMD (both around 860 MB).

In both modes (0 and 3), however, while playing the trajectory on a loop the memory usage continues growing with each frame that is displayed. This bug has been fixed by PyMOL and should not be an issue in future releases (this applies both to the open source version downloadable from GitHub and the pre-compiled binary version downloadable directly from Schrödinger). However, even after this bug fix, the amount of memory used by PyMOL still increases during the first loop over a trajectory. In our test case, this resulted in using 9264 MB of RAM.

As the size and length of molecular dynamics trajectories is rapidly increasing, the ability to visualise and interactively analyse these trajectories without worrying about long loading times or impossibly large RAM requirements will become increasingly important.

How it Works – The Basics

We have begun to embed MDAnalysis directly into PyMOL; however, this has not been as straightforward as we had first expected.

Although it has a python interface, PyMOL’s backend is in C++. This C++ backend is responsible for everything from loading trajectories to rendering molecules. To pass atomic coordinates from MDAnalysis to PyMOL, we have used PyMOL’s callback system. This callback mechanism requires us to register a function to be called when a given frame is requested. In other words, for a simulation with five thousand frames, we currently need five thousand callback requests. We hope to simplify this in the future with a general frame_updated(callback) function.

In practice, as the slider is moved, MDAnalysis is called to load the atomic positions at the frame corresponding to the slider position. These positions are then sent to the PyMOL C++ backend, which updates the coordinates and renders the molecules.

Whilst we say that mda_load_trajloads all frames instantly, this is not strictly true. What we provide is instant access to all frames, but only one frame is actually loaded into memory at a time. We are able to do this thanks to the capability of MDAnalysis to access any frame of a trajectory file without preloading the simulation, which it does through indexing of frames.

Future Features

In our next post we’ll be describing the interactive plotting functionality that we have recently added to PyMOL.

Let us know in the comments below if there are any features regarding trajectory reading or analysis that you would like to see in PyMOL.

Acknowledgement

A big thank you to Thomas Holder (PyMOL’s lead developer) for his instrumental help in implementing this feature.

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *