Michael's Bioinformatics Blog

May 13, 2015May 14, 2015

Spot the dancing gorilla to code better python

OK, OK, I know the title of this post falls into the gray area between informative and “click-bait.”

However, now that you’re here, watching the following talks by Python Core Developer and coding guru Raymond Hettinger will be both immediately useful and highly entertaining!

PyCon 2015 — Beyond PEP8

Can you spot the dancing gorilla in your code?

PyCon 2013 — Class Development toolkit

From Mom’s basement to a loft in SOMA, Python classes solve your startup woes

May 6, 2015May 13, 2015

How not to use IPython.parallel on a laptop

In this post I want to focus on an aspect of using the IPython.parallel implementation that may be confusing to new users.

In the IPython.parallel documentation, one of the first things you do to show that you have started the parallel python engines is a call to python’s “map” method with a lambda function that takes x to the 10th power over a range of x.

In serial (non-parallel) form that is as follows:

serial_result = map(lambda x:x**10, range(100))

Then, you do the same in parallel with the python engines you’ve started:

parallel_result = lview.map(lambda x:x**10, range(100))

Then, you assert that the results are the same:

assert serial_result == parallel_result

This works fine, but there is a problem. You would probably never actually use an IPython.parallel client for work like this. Given that the documentation is aimed at introducing new users, it is a bit confusing to present this simple example without the caveat that this is not a typical use case.

Here is why you’d never actually code this calculation in parallel:

In [8]: %timeit map(lambda x:x**10, range(3000))
100 loops, best of 3: 9.91 ms per loop

In [9]: %timeit lview.map(lambda x:x**10, range(3000))
1 loops, best of 3: 22.8 s per loop

Notice that the parallel version of this calculation over a range of just 3000, took 22 secs to complete! That is 2,300 times slower than just using one core and the built-in map method.

Apparently, this surprising result is because there is a huge amount of overhead associated with distributing the 3000 small, very fast jobs in the way I’ve written statement [9] above. Every time the job is distributed to an engine, the function and data have to be serialized and deserialized (“pickled”), if my understanding is correct.

In response to my StackOverflow question on this issue, Univerio helpfully suggested the following more clever use of parallel resources (he is using 6 cores in this example):

In [7]: %timeit map(lambda x:x**10, range(3000))
100 loops, best of 3: 3.17 ms per loop

In [8]: %timeit lview.map(lambda i:[x**10 for x in range(i * 500)], range(6))  # range(6) owing to 6 cores available for work
100 loops, best of 3: 11.4 ms per loop

In [9]: %timeit lview.map(lambda i:[x**10 for x in range(i * 1500)], range(2))
100 loops, best of 3: 5.76 ms per loop

Note that what Univerio is doing in line [8] is to distribute equal shares of the work across 6 cores. Now the time to complete the task is within the same order of magnitude as the single-threaded version. If you use just two tasks in example [9], the time is cut in half again owing to less overhead.

The take-home message is that if you’re going to expend the overhead necessary to setup and start multiple IPython.parallel engines and distribute jobs to them, the jobs need to be more resource-consuming than just a few ms each. And you should try to make as few function calls as possible. Each call should do as much work as possible.

April 15, 2015

Practical Fragments blog has reviewed our paper!

Our latest fragment-based drug discovery paper against the p97 ATPase has been noticed and reviewed favorably by the widely-read Practical Fragments blog.

Here is an excerpt from that review:

“The protein p97 is important in regulating protein homeostasis, and thus a potential anti-cancer target. But this is no low-hanging fruit: the protein has three domains and assembles into a hexamer. Two domains, D1 and D2, are ATPases. The third (N) domain binds to other proteins in the cell. All the domains are dynamic and interdependent. Oh, and crystallography is tough. Previous efforts have identified inhibitors of the D2 domain, but not the others. Not to be put off by difficult challenges, a group of researchers at the University of California San Francisco (UCSF) led by Michelle Arkin and Mark Kelly have performed fragment screening against the D1 and N domains, and report their adventures in J. Biomol. Screen.

March 13, 2015

High performance computing versus high throughput

Two approaches to scientific computing

The terms “high performance computing” (HPC) and “high throughput computing” (HTC) might sound interchangeable to those not familiar with scientific computing, but they denote two very different approaches to computing. I’m going to describe the difference below (with the caveat that I have only a layman’s understanding of this field).

High throughput computing is for many smaller tasks

HTC is a computing approach that aims to make available a large number of computers to quickly accomplish tasks that are easily broken up into smaller, independent components. For example, if you have to process 100 video clips, and each one takes ~1 hr, then you would need ~100 hrs of computing time on your laptop.

However, if you had 100 laptops, you could theoretically do the task in 1 hr assuming that you could instantly command each one to begin the processing task (in reality, of course, you’d have to run around setting up the task on each computer which could take longer than the compute time). The point is this: each video processing task is independent of the others.

It is these types of tasks that HTC aims to address. By providing many hundreds or thousands of networked CPUs in a cluster and a software application that can easily and automatically track and distribute hundreds of tasks (called a DRM or distributed-resource manager) an HTC user can submit a task such as the video processing example described above and have it automatically farmed out to 100 compute nodes for processing (in the HTC world this is called a “pleasantly parallel” problem). Once each node completes, the data are copied back into the user’s home folder and it appears to the user that they have just used an extremely fast computer, when in fact they have used 100 computers working simultaneously.

High performance computing is for difficult computational problems

Now, however, consider the case of a computational task where each subunit is not independent of all of the others. One that I am intimately familiar with is Molecular Dynamics (MD) simulations of protein structure and dynamics. In MD simulations, an algorithm simulates the atomic motions of a protein molecule immersed in a box of waters on a very short timescale (somewhere on the order of a microsecond). Even with the short timescale, this is a very compute-intensive task. But because each atom in the protein interacts with many other atoms in the system, it is a task that can’t be neatly broken down into independent components in the way that video processing can be. You can’t simply give each atom to a separate compute node. In effect, MD simulation is a single, extremely resource-intensive computation.

Enter high performance computing. In HPC (also called supercomputing), the aim is to build hardware and software that are focused on peak computing capability (i.e., speed) and extremely fast interconnectedness, rather than on the number of simultaneous tasks that can be accomplished. The “high performance” part of HPC comes about from the technological focus on networking the computational nodes together with extremely fast connections so that communicating data and messages back and forth does not become a significant bottleneck to completing a large-scale computation.

On the software side of HPC, code libraries like MPI have been developed that allow simulations to be “parallelized” (i.e., broken down into smaller pieces). These smaller pieces (called “domain decomposition” for MD simulation) are then farmed out to the compute nodes of an HPC supercomputer and they can exchange data in real time so that each part of the simulation “knows” about the results from every other part. In this way, the velocities and positions of certain atoms of a protein can be influenced by all of the other velocities and positions of atoms even if they are being simulated on different CPU nodes.

February 2, 2015March 16, 2015

Cancer immunotherapy and the role of tryptophan

cancer immunotherapy drug 1-MT — DL-1-Methyltryptophan

Background

In the body, L-tryptophan is catabolized by an enzyme called Indoleamine 2,3-dioxygenase (IDO) to form a class of molecules known as kynurenines. These compounds have been shown to be immunosuppresive, preventing inflammation and T-cell mobilization. Additionally, depletion of cellular stores of L-tryptophan also appears to induce down-regulation of the immune response.

What does this have to do with cancer immunotherapy? Interestingly, cancer actively hijacks the IDO pathway to promote immune system suppression and tolerance to tumor cell antigens by overexpressing IDO in the tumor, at host cells in the immediate area of the tumor, and at tumor-draining lymph nodes where T-cells could normally become activated against tumor antigens.

Think of it like a beekeeper using smoke to keep the bees calm as the keeper removes honey from the hive. By upregulating the expression and activity of the IDO pathway, tumors effectively “hide” from the immune system while they grow out of control in the host tissue. But this exploitation of the body’s own immune regulation by cancer also presents a weakness that can leveraged in the fight against tumor progression.

Inhibiting IDO to enable tumor recognition

Enter 1-methyl-DL-tryptophan (1MT), pictured above. 1MT is known to be an inhibitor of IDO that works presumably by mimicking the natural substrate (although I believe this has not been shown explicitly). IDO inhibition by 1MT has been shown to work in combination with chemotherapy approaches to limit tumor progression in mouse models.

Adding 1MT to chemotherapy treatments allows the host immune system to mediate a response to the tumor cells, especially in the presence of dying tumor cells undergoing apoptosis and releasing antigen. By taking away tumor-induced immune tolerance, 1MT inhibition of IDO allows the T-cell system to recognize, attack and destroy cancer cells in synergy with chemotherapy.

Early clinical trials involving 1MT appear to be ongoing, with work being done by NewLink Genetics in Ames, IA.

————–

References

Munn, DH. Indoleamine 2,3 Dioxygenase and Tumor-Induced Immune Suppresion. Ch. 17 Innate Immune Regulation and Cancer Immunotherapy. DOI: 10.1007/978-1-4419-9914-6_17, Springer Science

Wainright, et al. Clinincal Cancer Research October 15, 2014 20:5290–5301

https://en.wikipedia.org/wiki/Indoleamine_2,3-dioxygenase