Decode SAM files with these handy references

I recently had to inspect some genomic alignments as part of a project.  Usually, I am just working with BAM files and if inspection is needed, I just visualize the pileups to see what is going on.

In this case, I just wanted a quick answer to how the reads were aligning to the reference, and I didn’t want to go through the process of subsetting and copying the BAM files to my local machine.

The SAM file is the uncompressed record of the read alignments produced by an aligner method (STAR, TopHat, BWA, etc….).   This file can get very large, and so is usually compressed into BAM (faster for machine parsing, but not human readable) and the SAM file is discarded.

In my case, I still had the SAM files around to inspect.  If you find yourself needing to read a SAM file, here are three helpful reference tools to make the process less painful:

1)  This page has an enormous amount of detail about SAM files including this helpful chart that enumerates all of the fields that you can expect to find specified within each alignment:

SAM file structure explained in this handy chart.

 

2) This post from the blog “zenfractal.com” contains a great exposition on CIGAR strings and how to decode them:

3)  And finally, if you’re trying to decode the SAM bitwise flags, you can calculate them using this tool from the Broad Institute:

Decode SAM flags with this handy online tool from the Broad Institute.

 

 

Get hands-on with t-SNE plots

With the growing popularity of single-cell RNA-Seq analysis, the t-SNE projection of multi-dimensional data is appearing more often in publications and online.  If you’ve ever wanted to develop a better intuitive feel for what exactly t-SNE does and where it can go wrong, this interactive tutorial (by Martin Wattenberg and Fernanda Viegas) is extremely compelling and useful.

A screen capture of the interactive t-SNE interface.

In addition to providing a wonderful, interactive plotting function, the authors go on to provide an informative tutorial explains the pitfalls and challenges of the optimization and hyper-parameter tuning of t-SNE projections and how to get the most from the plots.  Here is an example:

An example of how hyperparameter tuning affects the final plot.

In the example above, tuning the “perplexity” of the t-SNE projection causes the correct reconstruction of the data when values are between 30-50, but the same method fails when the parameter falls outside those ranges (i.e., too small or too large).

Go check out this distill.pub site.  It’s worth your time.

Breakthrough advances in 2018 so far: flu, germs, and cancer

2018 medicine breakthrough review!

So far this year has seen some pretty important research breakthrough advances in several key areas of health and medicine.  I want to briefly describe some of what we’ve seen in just the first few months of 2018.

Flu

A pharmaceutical company in Japan has released phase 3 trial results showing that its drug, Xofluza, can effectively kill the virus in just 24 hours in infected humans.  And it can do this with just one single dose, compared to a 10-dose, three day regimen of Tamiflu. The drug works by inhibiting an endonuclease needed for replication of the virus.

Germs

It is common knowledge that antibiotics are over-prescribed and over-used.  This fact has led to the rise of MRSA and other resistant bacteria which threaten human health.  Although it is thought that bacteria could be a source of novel antibiotics since they are in constant chemical warfare with each other, most bacteria aren’t culture-friendly in the lab and so researchers haven’t been looking at them for leads.  Until now.

Malacidin drugs kill multi-drug resistant S. Aureus in tests on rats.

By adopting whole genome sequencing approaches to soil bacterial diversity, researchers were able to screen for gene clusters associated with calcium-binding motifs known for antibiotic activity.   The result was the discovery of a novel class of lipo-peptides, called malacidins A and B.  They showed potent activity against MRSA in skin infection models in rats.

The researchers estimate that 99% of bacterial natural-product antibiotic compounds remain unexplored at present.

Cancer

2017 and 2018 have seen some major advances with cancer treatment.   It seems that the field is moving away from the focus on small-molecule drugs towards harnessing the patient’s own immune system to attack cancer.  The CAR-T therapies for pediatric leukemia appear extremely promising.  These kinds of therapies are now in trials for a wide range of blood and solid tumors.

A great summary of the advances being made is available here from the Fred Hutchinson Cancer Research Center.   Here is how Dr. Gilliland, President of Fred Hutch, begins his review of the advances:

I’ve gone on record to say that by 2025, cancer researchers will have developed curative therapeutic approaches for most if not all cancers.

I took some flak for putting that stake in the ground. But we in the cancer research field are making incredible strides toward better and safer, potentially curative treatments for cancer, and I’m excited for what’s next. I believe that we must set a high bar, execute and implement — that there should be no excuses for not advancing the field at that pace.

This is a stunning statement on its own;  but made even more so because it is usually the scientists in the day-to-day trenches of research who are themselves the most pessimistic about the possibility of rapid advances.

Additionally, an important paper came out recently proposing a novel paradigm for understanding and modeling cancer incidence with age.  For a long time the dominant model has been the “two-hit” hypothesis which predicts that clinically-observable cancers arise when a cell acquires sufficient mutations in tumor-suppressor genes to become a tumor.

This paper challenges that notion and shows that a model of thymic function decline (the thymus produces T-cells) over time better describes the incidence of cancers with age.   This model better fits the data and leads to the conclusion that cancers are continually arising in our bodies, but it is our properly functioning immune system that roots them out and prevents clinical disease from emerging.  This model also helps explain why novel cancer immunotherapies are so potent and why focus has shifted to supporting and activating T-cells.

Declining T cell production leads to increasing disease incidence with age.

 

Is CB-5083 a promising new weapon against multiple myeloma?

Why care about p97?

In my postdoc work, I participated in a large team effort at designing a small molecule inhibitor of the p97 AAA-ATPase.

A crystal structure of the p97 ATPase.  The D2 domain is shown in dark blue.

The funding for this project came from the National Cancer Institute (NCI) and was premised on the idea that inhibiting p97 in certain types of cancer cells that depend heavily on the endoplasmic-reticulum associated degradation pathway (ERAD) would have the effect of triggering the unfolded-protein response and apoptosis pathways within the rapidly growing tumor cell populations.   This is because p97 is a critical regulator and component of ERAD, and when it is inhibited, the cell experiences unbalanced protein homeostasis and unfolded protein stress.

Drug design is an extremely challenging problem, and even with a large group of researchers it took us several years to find a compound that showed promising inhibition against p97.   Our results were published in ACS Med Chem Letters in 2016.   The compound we discovered, indole amide 3, has high solubility, permeability, and stability.  It binds an allosteric site on the D2 domain  with sub-micromolar affinity.   Unfortunately, it just didn’t have enough binding affinity to be active in vivo.

A different approach yields new promise

At around the same time we were developing our allosteric inhibitor series, another group was developing an ATP competitive D2 domain inhibitor of p97, called CB-5083.  In contrast to our compound, this one binds directly to the D2 ATP enzyme site with nanomolar affinity.

CB-5083.

The compound also demonstrated potent and specific p97 inhibition activity in mouse xenograft models of tumors.

An advance in myeloma cancer therapy

A more recent paper (Nov 2017) shows activity for CB-5083 against multiple myeloma (MM) cell lines and in vivo MM models.  From the abstract:

CB-5083 decreases viability in multiple myeloma cell lines and patient-derived multiple myeloma cells, including those with background proteasome inhibitor (PI) resistance. CB-5083 has a unique mechanism of action that combines well with PIs, which is likely owing to the p97-dependent retro-translocation of the transcription factor, Nrf1, which transcribes proteasome subunit genes following exposure to a PI. In vivo studies using clinically relevant multiple myeloma models demonstrate that single-agent CB-5083 inhibits tumor growth and combines well with multiple myeloma standard-of-care agents.

Standard of care agents, like bortezomib, are proteasome inhibitors (PI).  Using a PI results in broad inhibition of the proteasome system across many cell types, not just tumor cells, and thus a high likelihood of side effects.  p97 is upstream of the proteasome and targeting it is more narrow in scope, because MM cells rely so heavily on the protein homeostasis activities of the ERAD pathway.

Hope for Phase 1 success

CB-5083 was also found to enhance the activity of bortezomib both in vitro and in vivo and also was active in bortezomib-resistance models of MM.  This paves the way for a potential combination therapy or another line of therapy if resistance develops as a result of earlier treatment with PIs.   Clinical trials are now ongoing in Phase 1 for patients who have exhausted other medications.  Hopefully CB-5083 makes it to the market soon, if trials prove it to be safe and efficacious, so that oncologists and patients have another weapon in the fight against MM.