Breakthrough advances in 2018 so far: flu, germs, and cancer

2018 medicine breakthrough review!

So far this year has seen some pretty important research breakthrough advances in several key areas of health and medicine.  I want to briefly describe some of what we’ve seen in just the first few months of 2018.


A pharmaceutical company in Japan has released phase 3 trial results showing that its drug, Xofluza, can effectively kill the virus in just 24 hours in infected humans.  And it can do this with just one single dose, compared to a 10-dose, three day regimen of Tamiflu. The drug works by inhibiting an endonuclease needed for replication of the virus.


It is common knowledge that antibiotics are over-prescribed and over-used.  This fact has led to the rise of MRSA and other resistant bacteria which threaten human health.  Although it is thought that bacteria could be a source of novel antibiotics since they are in constant chemical warfare with each other, most bacteria aren’t culture-friendly in the lab and so researchers haven’t been looking at them for leads.  Until now.

Malacidin drugs kill multi-drug resistant S. Aureus in tests on rats.

By adopting whole genome sequencing approaches to soil bacterial diversity, researchers were able to screen for gene clusters associated with calcium-binding motifs known for antibiotic activity.   The result was the discovery of a novel class of lipo-peptides, called malacidins A and B.  They showed potent activity against MRSA in skin infection models in rats.

The researchers estimate that 99% of bacterial natural-product antibiotic compounds remain unexplored at present.


2017 and 2018 have seen some major advances with cancer treatment.   It seems that the field is moving away from the focus on small-molecule drugs towards harnessing the patient’s own immune system to attack cancer.  The CAR-T therapies for pediatric leukemia appear extremely promising.  These kinds of therapies are now in trials for a wide range of blood and solid tumors.

A great summary of the advances being made is available here from the Fred Hutchinson Cancer Research Center.   Here is how Dr. Gilliland, President of Fred Hutch, begins his review of the advances:

I’ve gone on record to say that by 2025, cancer researchers will have developed curative therapeutic approaches for most if not all cancers.

I took some flak for putting that stake in the ground. But we in the cancer research field are making incredible strides toward better and safer, potentially curative treatments for cancer, and I’m excited for what’s next. I believe that we must set a high bar, execute and implement — that there should be no excuses for not advancing the field at that pace.

This is a stunning statement on its own;  but made even more so because it is usually the scientists in the day-to-day trenches of research who are themselves the most pessimistic about the possibility of rapid advances.

Additionally, an important paper came out recently proposing a novel paradigm for understanding and modeling cancer incidence with age.  For a long time the dominant model has been the “two-hit” hypothesis which predicts that clinically-observable cancers arise when a cell acquires sufficient mutations in tumor-suppressor genes to become a tumor.

This paper challenges that notion and shows that a model of thymic function decline (the thymus produces T-cells) over time better describes the incidence of cancers with age.   This model better fits the data and leads to the conclusion that cancers are continually arising in our bodies, but it is our properly functioning immune system that roots them out and prevents clinical disease from emerging.  This model also helps explain why novel cancer immunotherapies are so potent and why focus has shifted to supporting and activating T-cells.

Declining T cell production leads to increasing disease incidence with age.


Genomic landscape of metastatic cancer

Integrative genomics sheds new light on metastatic cancer

A new study from the University of Michigan Comprehensive Cancer Center has just been released that represents an in-depth look at the genomics of metastatic cancer, as opposed to primary tumors.   This work involved DNA- and RNA-Seq of solid metastatic tumors of 500 adult patients, as well as matched normal tissue sequencing for detection of somatic vs. germline variants.


A good overview of the study at the level of scientific layperson can be found in this press release.  It summarizes the key findings (many of which are striking and novel):

  • A significant increase in mutational burden of metastatic tumors vs. primary tumors.
  • A long-tailed distribution of mutational frequencies (i.e., few genes were mutated at a high rate, yet many genes were mutated).
  • About twelve percent of patients harbored germline variants that are suspected to predispose to cancer and metastasis, and 75% of those variants were in DNA repair pathways.
  • Across the cohort, 37% of patient tumors harbored gene fusions that either drove metastasis or suppressed the cells anti-tumor functions.
  • RNA-Seq showed that metastatic tumors are significantly de-differentiated, and fall into two classes:  proliferative and EMT-like (endothelial-to-mesenchymal transition).

 A brief look at the data

This study provides a high-level view onto the mutational burden of metastatic cancer vis-a-vis primary tumors.  Figure 1C from the paper shows the comparison of mutation rates in different tumor types in the TCGA (The Cancer Genome Atlas) primary tumors and the MET500 (metastatic cohort).

Mutational burden in metastatic cancer compared to primary tumors.


Here we can see that in most cases (colored bars), metastatic cancers had statistically significant increases in mutational rates.   The figure shows that tumors with low mutational rates “sped up” a lot as compared with those primary tumor types that already had high rates.

Supplemental Figure 1d (below) shows how often key tumor suppressor and oncogenes are altered in metastatic cancer vs. primary tumors.  TP53 is found to be altered more frequently in metastatic thyroid, colon, lung, prostate, breast, and bladder cancers.   PTEN is mutated more in prostate tumors.  GNAS and PIK3CA are mutated more in thymoma, although this finding doesn’t reach significance in this case.  KRAS is altered more in colon and esophagus cancers, but again, these findings don’t reach significance after multiple correction.

Comparison of genetic alteration frequencies in metastatic and primary tumors.


One other figure I’d like to highlight briefly is Figure 3C from the paper, shown below:

Molecular structure of novel, potentially activating gene fusions in the metastatic tumors.

I wanted to mention this figure to illustrate the terrifying complexity of cancer.   Knowing which oncogenes are mutated, in which positions, and the effects of those mutations on gene expression networks is not enough to understand tumor evolution and metastasis.  There are also new genes being created that do totally new things, and these are unique on a per tumor basis.   None of the above structures have ever been observed before, and yet they were all seen from a survey of just 500 cancers.   In fact, ~40% of the tumors in the study cohort harbored at least one fusion suspected to be pathogenic.

There is much more to this work, but I will leave it to interested readers to go read the entire study.   I think this work is obviously tremendously important and novel, and represents the future of personalized medicine.  That is, a patient undergoing treatment for cancer will have their tumor or tumors biopsied and sequenced cumulatively over time to understand how the disease has evolved and is evolving, and to ascertain what weaknesses can be exploited for successful treatment.

Book Review: Python for Data Analysis

Python for data analysis


The book “Python for Data Analysis” (O’Reilly Media 2013) by author Wes McKinney is a guide to using the NumPy, matplotlib, and pandas Python libraries for data analysis. The author sets out to provide a template for Python programmers to gain working knowledge of the rapidly maturing Python technologies for data analysis and visualization tasks.   The tone of the book is conversational and focused, with no fluff or filler. The book accomplishes its purpose admirably by providing a concise, meaty, and highly readable tutorial through the essential features of doing data analysis in Python.

McKinney does a skillful job of bringing the Python novice through the requisite background and quickly up to speed doing useful work with pandas without becoming bogged down in introductory Python minutia. In fact, the opening chapter is titled “Introductory Examples” and includes several relatively complex data analysis examples that serve to demonstrate the capabilities of pandas. I found this approach provided me with the motivation to read on into the more detailed and technical chapters.

Why you should listen to Wes McKinney

The author is uniquely suited to write this book, having been the creator and first developer of pandas in the course of his own work as a quantitative analyst at a hedge fund back in 2008. I could tell that the author has a mastery of the subject; he provides many useful insights that could only be gained through real-world experience. The book focuses mainly on the pandas library and its core technologies, the Series and the Dataframe. Both are important because they build on the speed and precision of numpy arrays, while allowing richer, more intuitive and powerful manipulation of data tables.

pandas: it just works the way it should

Another aspect of this book that is so enjoyable is that pandas itself just works the way I would expect it to work. The tools, in my opinion, are constructed to be as convenient and intuitive as possible. I find that pandas behaves very predictably, despite being extremely powerful. Oftentimes, I was able to invent an expression in pandas that behaved exactly as I intended without knowing a priori whether it was possible to do so. There is something very satisfying about a tool that just works and doesn’t require a lot of boilerplate code.

The publisher also provides downloadable iPython notebooks containing the code examples for each chapter. Using these notebooks it was very easy to follow along, running code while reading the chapter. The illustrations in the book also consist almost entirely of matplotlib plots prepared using the code examples. I was able to work up many of the figures, giving me a sense of having gained practical, working knowledge in each chapter.

Python for data analysis? Yes!

I really have nothing negative to say about “Python for Data Analysis”. If forced to find something to change, it would be that the author could have left out the highly-condensed chapter on introductory Python programming found at the end of the book, using the extra space instead to include even more examples of pandas in practical, real-world applications.

For instance, an example on building a data analysis model with interactive graphics for the web would have been welcome. Similarly, a demonstration of approaches for making matplotlib, with its rather utilitarian graphics, more closely resemble the stylistically attractive plots of ggplot2 (the well-known R plotting library) would also have been useful.

After reading this book, however, I have been convinced to transition my data analysis workflow entirely into Python and largely abandon R, which now seems somewhat esoteric and unnecessarily complex by comparison. Overall, I would highly recommend this book to anyone seeking to learn how to use Python for data analysis. It is a valuable reference for scientists, engineers, data analysts, and others who want to leverage the power of Python (and specifically numpy and pandas) for dealing with their data.