Automate your Topspin NMR workflow

Here is a tip for scientists that need to batch process NMR data quickly and uniformly for analysis.  This approach could be a big time-saver in situations where you have a large series of 1D reference spectra collected by sample automation, for example.  Or in NMR screening applications, where dozens of STD-NMR experiments are being collected during an overnight run.

Hidden away in the Topsin “Processing” menu is a feature called “Serial Processing:”

Screen shot 2014-10-20 at 2.39.51 PMSelect this menu option and you will see the following dialogue:

Screen shot 2014-10-20 at 2.49.37 PMSince this is first time you are doing this operation, you need to select “find datasets” in order to first find the data to process.  In the future, you will have a “list” created for you by the program that you can reuse to reference datasets in combinations that you specify.

When you click “find datasets” you will see this dialogue:

Screen shot 2014-10-20 at 2.40.02 PMSelect the data directory to search from the “data directories” box at the bottom of the window.   (If your NMR data directory is not here, it is because you haven’t added it to the Topspin file brower in the main Topspin window.  Go do that first, and come back and try this operation again.)

Under the “name” field, enter the name of the specific dataset directory you wish to search, or leave it blank to search across many directories.  You can also match on experiment number (EXPNO) or process number (PROCNO).  The check boxes enforce exact matching.  You can select 1D or higher dimensional datasets for processing.  You can also match by date.

When you’ve made your selections, it will look like this:

Screen shot 2014-10-20 at 2.40.46 PMIn this search, I am selecting for all 1D data contained in the “Oct16-2014-p97” subdirectory of my NMR data repository at “/Users/sandro/UCSF/p97_hit2lead/nmr”.

Click “OK” and wait for the results.  Mine look like the following:

Screen shot 2014-10-20 at 2.41.18 PMThe program has found 24 datasets that match my criteria.  At this point, you want to select only those you wish to batch process.  I will select all files like this:

Screen shot 2014-10-20 at 2.41.36 PMNow click “OK” and you are returned to this prompt:

Screen shot 2014-10-20 at 2.42.00 PMNotice that the program has now created a list of datasets for batch processing for you, store in the ‘/var/folders/’ temporary directory.  The list is a text-based list of the filenames you specified by your selection criteria.  You can edit by hand or proceed to the next step.  To proceed, click “next.”  You will now see this dialogue:

Screen shot 2014-10-20 at 2.42.20 PMThis is where the useful, time-saving stuff happens.   This dialogue takes the list you defined and applies whatever custom command sequence you would like to apply to your data.  You define this sequence in the text box at the bottom.  As you can see, I have chosen to perform “lb 1; em; ft; pk.”   This is line broadening = 1, exponential multiplication, fourier transform, and phase correction.  You can also specify a path to a python script for the Topspin API.

Once you have your desired processing commands, click “Execute” and go grab a coffee!  You just saved yourself many minutes of routine processing of NMR spectra.    Hope you find this tip useful and that it can save you some time in your day.


Five cool features in PyMOL (that you may have missed)

Here are some lesser-known PyMOL tricks that let you do some pretty cool (and useful) things:

1.  Display B-factors

If you want to see the “b-factor putty” view where the backbone is displayed as a tube with a diameter correlated to the b-factor of the structure, simply click the Action button of the object (the “A” button); mouse to “preset”; and select “b-factor putty.”  The structure will automatically convert to a colored putty view suitable for slide figures or publications.

Create B-factor putty in PyMOL.
Create B-factor putty in PyMOL.

2.  Poisson-Boltzmann electrostatics

Want to get a sense for the patches of positive and negative potential on the surface of your protein?   No need for esoteric PB electrostatics solvers (at least at first!); PyMOL has got you covered.   Click on the “action” button as before, but this time select “generate” and then “vacuum electrostatics.”    Finally select “protein contact potential (local).”

Be sure to heed (or, more typically, ignore) the warning about how the results are qualitative and not quantitative.  And then proceed to enjoy the lovely patches of negative and positive potential rendered on the surface of your protein of interest.

Screen shot 2014-09-22 at 9.45.34 PM

3.  Quickly get an estimate of the solvent-accessible surface area (SASA) of a PDB structure

This takes advantage of the “get_area” command in PyMol.  You’ll have to dive into the command line here to take advantage of this trick.  For example, if you have a PDB object 1UBQ (ubiquitin) you would do the following at the command prompt (the bar below the structure viewing area):

set dot_solvent, 1   ##  set dot_solvent to calculate the SASA

set dot_density, 4  ## most dots for most accurate calculation

get_area, 1UBQ  ## calculate the SASA

Keep in mind that as in point #2, the SASA calculation in PyMOL is an estimate.  For very accurate calculation, use a dedicated SASA solver.

4.  Render a scene as a pen-and-ink sketch

This tip is a lot of fun, because it lets you see your molecules as if they had been drawn in a pen-and-ink manner, like a real cartoon.  This is sometimes useful for presentations or other less-formal venues where you want to clearly illustrate something about your molecular structure.

Once again, we will use the command prompt.  Start off by typing:

set ray_trace_mode, 3


PyMOL will render your scene and display the result.  Here is what it looks like for 1UBQ:

Screen shot 2014-09-29 at 12.44.30 PM

5.  “Rock” and roll

Finally, the last tip is extremely simple.   If you and your colleagues are sitting around looking at a molecule structure, sometimes it helps to view it from different angles.  Instead of twiddling the mouse back and forth, you can simply click the “rock” button at the top right of the PyMOL window to start the scene panning gently from side-to-side to aid visualization.

Please feel free to post any other useful PyMOL tricks and tips in the comments below.

Note:  I am running PyMOL 1.3 Incentive on a Mac OS X 10.6.8 system.

FTMap: fast and free* druggable hotspot prediction

*free to academics

FTMap is a useful and fast online tool that attempts to mimic experimental fragment-screening methodologies (SAR-by-NMR and X-ray crystallography) using in silico methods.   The algorithm is based on the premise that ligand binding sites in proteins often show “hotspots” that contribute most of the free energy to binding.

Often, fragment screening will identify these hotspots when clusters of different types of fragments all bind to the same subsite of a larger binding site.   In fact, x-ray crystallography studies of protein structures solved in a variety of organic solvents demonstrate that small organic fragments often form clusters in active sites.

In the FTMap approach, small organic probes are used for an initial rigid-body docking against the entire protein surface.  The “FT” of FTMap stands for the use of fast Fourier transform (FFT) methods to quickly sample billions of probe positions while calculating accurate energies based on a robust energy expression.

Following docking of each probe, thousands of poses are energy-minimized and clustered based on proximity.  The clusters are then ranked for lowest energy.   Consensus sites (“hot spots”) are determined by looking for overlapping clusters of different types of probes within several angstroms of each other.   If several consensus sites appear near each other on the protein surface, that is a strong indication of a potentially druggable binding site.

Importance of list comprehensions in Python

A beginner to python programming is usually taught to use for loops to do a lot things. The temptation  is to bust out a for loop whenever you need to modify a list or string object, but this quickly leads to complex “loops within loops” code architectures that are hard to read (by humans), even though they may work OK.

A simple example:

>>>test_list = [2,4,6,8]

>>>for x in test_list:

…     new_list.append(x  + 1)



A better approach is to take advantage of Python’s built-in list comprehension expressions, with the form ‘x for x in y’.


>>>new_list = [x+2 for x in test_list]



This can be expanded to include conditionals, for example:

>>>stripped_list = [line.strip() for line in line_list if line !=””]

You can also loop over multiple elements like this:



>>>[(x,y) for x in seq1 for y in seq2]


Important to use deuterated buffers in small molecule NMR

One way to make your life massively easier if you are doing NMR of small molecules, especially at low concentrations (sub-1mM), is to simply work out what buffer you’d like to use and then order all of the components in deuterated form ahead of time.

For example, if you would like to study your molecule in a buffer like HEPES with 5% DMSO, you can order fully-deuterated HEPES and DMSO from companies like CIL and Sigma-ISOTEC.  Although expensive, the time it can save you at the spectrometer and the enhanced quality of the data are likely worthwhile tradeoffs.

You can also go a step further and prepare your buffers in 100% D2O, making water suppression vastly easier and improving the quality of your spectra. These steps work together in a synergistic manner to dramatically improve your data quality when acquiring on small molecules at low concentrations.