Breakthrough advances in 2018 so far: flu, germs, and cancer

2018 medicine breakthrough review!

So far this year has seen some pretty important research breakthrough advances in several key areas of health and medicine.  I want to briefly describe some of what we’ve seen in just the first few months of 2018.


A pharmaceutical company in Japan has released phase 3 trial results showing that its drug, Xofluza, can effectively kill the virus in just 24 hours in infected humans.  And it can do this with just one single dose, compared to a 10-dose, three day regimen of Tamiflu. The drug works by inhibiting an endonuclease needed for replication of the virus.


It is common knowledge that antibiotics are over-prescribed and over-used.  This fact has led to the rise of MRSA and other resistant bacteria which threaten human health.  Although it is thought that bacteria could be a source of novel antibiotics since they are in constant chemical warfare with each other, most bacteria aren’t culture-friendly in the lab and so researchers haven’t been looking at them for leads.  Until now.

Malacidin drugs kill multi-drug resistant S. Aureus in tests on rats.

By adopting whole genome sequencing approaches to soil bacterial diversity, researchers were able to screen for gene clusters associated with calcium-binding motifs known for antibiotic activity.   The result was the discovery of a novel class of lipo-peptides, called malacidins A and B.  They showed potent activity against MRSA in skin infection models in rats.

The researchers estimate that 99% of bacterial natural-product antibiotic compounds remain unexplored at present.


2017 and 2018 have seen some major advances with cancer treatment.   It seems that the field is moving away from the focus on small-molecule drugs towards harnessing the patient’s own immune system to attack cancer.  The CAR-T therapies for pediatric leukemia appear extremely promising.  These kinds of therapies are now in trials for a wide range of blood and solid tumors.

A great summary of the advances being made is available here from the Fred Hutchinson Cancer Research Center.   Here is how Dr. Gilliland, President of Fred Hutch, begins his review of the advances:

I’ve gone on record to say that by 2025, cancer researchers will have developed curative therapeutic approaches for most if not all cancers.

I took some flak for putting that stake in the ground. But we in the cancer research field are making incredible strides toward better and safer, potentially curative treatments for cancer, and I’m excited for what’s next. I believe that we must set a high bar, execute and implement — that there should be no excuses for not advancing the field at that pace.

This is a stunning statement on its own;  but made even more so because it is usually the scientists in the day-to-day trenches of research who are themselves the most pessimistic about the possibility of rapid advances.

Additionally, an important paper came out recently proposing a novel paradigm for understanding and modeling cancer incidence with age.  For a long time the dominant model has been the “two-hit” hypothesis which predicts that clinically-observable cancers arise when a cell acquires sufficient mutations in tumor-suppressor genes to become a tumor.

This paper challenges that notion and shows that a model of thymic function decline (the thymus produces T-cells) over time better describes the incidence of cancers with age.   This model better fits the data and leads to the conclusion that cancers are continually arising in our bodies, but it is our properly functioning immune system that roots them out and prevents clinical disease from emerging.  This model also helps explain why novel cancer immunotherapies are so potent and why focus has shifted to supporting and activating T-cells.

Declining T cell production leads to increasing disease incidence with age.


Genomic landscape of metastatic cancer

Integrative genomics sheds new light on metastatic cancer

A new study from the University of Michigan Comprehensive Cancer Center has just been released that represents an in-depth look at the genomics of metastatic cancer, as opposed to primary tumors.   This work involved DNA- and RNA-Seq of solid metastatic tumors of 500 adult patients, as well as matched normal tissue sequencing for detection of somatic vs. germline variants.


A good overview of the study at the level of scientific layperson can be found in this press release.  It summarizes the key findings (many of which are striking and novel):

  • A significant increase in mutational burden of metastatic tumors vs. primary tumors.
  • A long-tailed distribution of mutational frequencies (i.e., few genes were mutated at a high rate, yet many genes were mutated).
  • About twelve percent of patients harbored germline variants that are suspected to predispose to cancer and metastasis, and 75% of those variants were in DNA repair pathways.
  • Across the cohort, 37% of patient tumors harbored gene fusions that either drove metastasis or suppressed the cells anti-tumor functions.
  • RNA-Seq showed that metastatic tumors are significantly de-differentiated, and fall into two classes:  proliferative and EMT-like (endothelial-to-mesenchymal transition).

 A brief look at the data

This study provides a high-level view onto the mutational burden of metastatic cancer vis-a-vis primary tumors.  Figure 1C from the paper shows the comparison of mutation rates in different tumor types in the TCGA (The Cancer Genome Atlas) primary tumors and the MET500 (metastatic cohort).

Mutational burden in metastatic cancer compared to primary tumors.


Here we can see that in most cases (colored bars), metastatic cancers had statistically significant increases in mutational rates.   The figure shows that tumors with low mutational rates “sped up” a lot as compared with those primary tumor types that already had high rates.

Supplemental Figure 1d (below) shows how often key tumor suppressor and oncogenes are altered in metastatic cancer vs. primary tumors.  TP53 is found to be altered more frequently in metastatic thyroid, colon, lung, prostate, breast, and bladder cancers.   PTEN is mutated more in prostate tumors.  GNAS and PIK3CA are mutated more in thymoma, although this finding doesn’t reach significance in this case.  KRAS is altered more in colon and esophagus cancers, but again, these findings don’t reach significance after multiple correction.

Comparison of genetic alteration frequencies in metastatic and primary tumors.


One other figure I’d like to highlight briefly is Figure 3C from the paper, shown below:

Molecular structure of novel, potentially activating gene fusions in the metastatic tumors.

I wanted to mention this figure to illustrate the terrifying complexity of cancer.   Knowing which oncogenes are mutated, in which positions, and the effects of those mutations on gene expression networks is not enough to understand tumor evolution and metastasis.  There are also new genes being created that do totally new things, and these are unique on a per tumor basis.   None of the above structures have ever been observed before, and yet they were all seen from a survey of just 500 cancers.   In fact, ~40% of the tumors in the study cohort harbored at least one fusion suspected to be pathogenic.

There is much more to this work, but I will leave it to interested readers to go read the entire study.   I think this work is obviously tremendously important and novel, and represents the future of personalized medicine.  That is, a patient undergoing treatment for cancer will have their tumor or tumors biopsied and sequenced cumulatively over time to understand how the disease has evolved and is evolving, and to ascertain what weaknesses can be exploited for successful treatment.

Search speed comparison: naive exact vs. boyer-moore vs. k-mer index

Recently, I’ve been working my way through Ben Langmead’s excellent introduction to “Algorithms for DNA sequencing” on    The class is a fascinating and well-taught intro to concepts about DNA short read alignment and assembly methods.

As part of the course, we have implement or modify python code relating to several simple matching algorithms, including the “naive exact” (NEM) matching method, the “boyer-moore” (BM) method, and a k-mer index approach.

I was curious about speed, so I made a figure showing the computational time that each approach takes.  P and T refer to the length of the short read to be aligned and the genome to align to, respectively.

Figure 1. Comparing the speed of the NEM, BM, and K-mer search methods on long and short patterns (P) and texts (T). The y-axis is on a log-scale.

Note that the y-axis is a log scale in units of microseconds.  Right away, it is obvious that k-mer index methods are orders of magnitude faster than ‘online’ methods like NEM and BM.

Also of interest is the fact that as the pattern gets shorter, the advantage of BM preprocessing of the pattern gets smaller.  You can see that going from 30 to 11 pattern length negates any advantage to BM searching.


Variant annotation and transcript choice


Transcript choice between methods

Variant annotation methods do not all behave the same way when choosing transcripts to annotate against.  This leads to differing outcomes in annotations which may arise from different logic structures in the algorithms or different user criteria for annotation.

Unfortunately, incorrect annotations or disagreement in annotation outcomes can lead investigators to waste resources tracking down variants of little interest or to miss severe variants of potential clinical significance.

In this first post in a series, I’ll talk briefly about differing outcomes owing to transcript choices when three popular methods (ANNOVAR, VEP, and SNPEff) are applied to a dataset of 81 million variants from the 1000Genomes project.

In this figure you can see the lack of concordance owing to transcript choice affects a surprisingly large number of variants.

variant annotation

This disagreement is largely owing to the way that intergenic variants are handled, assigning them to nearest genes or arbitrary categories like “unknown.”

To learn more about this problem and other issues with annotators and concordance between methods, check out our recent paper at biorxiv.   In part two, I’ll talk more about concordance between methods when annotators agree on transcript choice.