Conference report: ISMB 2018 Chicago

In early July, I attended the ISMB 2018 meeting, a computational biology-focused meeting held by the International Society for Computational Biology (ISCB).  The meeting was held in the beautiful Hyatt Regency hotel in downtown Chicago, just across the street from the river and blocks from Navy Pier and Lakeshore Drive.

ISMB 2018 was a huge meeting, with at least 1500 attendees and up to ten parallel meeting tracks (called “COSIs” in ISCB parlance) at any one time.  The meeting was so big I always felt as if I was missing something good, no matter which talk I went to (except for the keynotes, where nothing else was happening).

Here is Steven Salzberg’s excellent keynote on a historical overview of finding genes in the human genome:

Obviously, at a meeting this large, one cannot explore more than a tiny fraction of the talks and posters (but you can watch many of the ISMB2018 talks on youtube now if you’re interested).

I want to briefly summarize three talks that I particularly enjoyed and found very interesting:

1) Michael Seiler, H3 biomedicine.  “Selective small molecule modulation of splicing in cancer”

Up to 60% of hematologic neoplasms (CLL/AML/MDS) contain heterozygous hotspot mutations in key splicing factor genes involved with 3′ splice site recognition.   One such gene is SF3B1, an RNA splicing factor, which is recurrently mutated around the HEAT repeats.  The cryo-EM structure of SF3B1 revealed the mutations cluster in the premRNA-interacting region.   All of the observed mutations lead to alternate 3′ splice sites being chosen during splicing; so-called “cryptic 3′ splice sites (AG)”. In particular, the K700E mutant slips the spliceosome upstream to an internal AG almost 18 nucleotides from the proper splice site.

RNA-seq of 200 CLL patients helped to discover further in-frame delections in SF3B1, SRSF2, and URAF1.   All of these mutations are HET, and the cells rely on the WT copy for survival.

Seiler asked the question whether one could exploit this weakness to attack cancer cells?  The idea is to modulate activity of SF3B1 using small molecules to disturb the WT function.  At H3 they took a natural product compound library and optimized for chemistry and binding to find candidates.  One molecule, H3B-8800, showed promise.  The compound was optimized toward selective induction of apoptosis in SF3B1 mutant cells in vitro. When resistance mutations occur, they are all at points of contact with the drug.  At only 13 nM dose, around 1% of splicing events were affected which was enough to cause lethality to the cancer cell.   RNA-seq demonstrated that it was mainly the spliceosome genes themselves that were affected by the alternate splicing in the presence of the inhibitor.  This demonstrated a delicate feedback loop between correct splicing and spliceosome gene expression.

2) Curtis Huttenhower,  Harvard University.  “Methods for multi-omics in microbial community population studies”

Huttenhower’s group at Harvard is well known for their contributions to methods for microbiome analysis, collectively known as the “biobakery.”  In this talk, Huttenhower addressed the fact that the microbiome is increasingly of interest when looking at population heath, and that chronic immune disease incidence is rising around the world over the last few decades.

Huttenhower also spent a good amount of time describing the IBD-MDB, the inflammatory bowel disease multi ‘omics database.  The database contains ~200 samples that are complete with six orthogonal datatypes, including RNA-seq, metagenome, and metabolome.   He talked about how they can associate bacteria enriched in IBD with covariation in the prevalence of metabolites.   For example, he showed how sphingolipids, carboximidic acids, cholesteryl esters and more are enriched while lactones, beta-diketones, and others are depleted in the guts of crohns disease suffers.

3) Olga Troyanskaya, Princeton University.  “ML approaches for the data-driven study of human disease”

Troyanskaya’s group at Princeton aims to develop accurate models of the complex processes underlying cellular function.  In this talk, she described efforts in her lab to use machine learning methods to understand how single nucleotide changes (SNPs) in non-coding regions can affect gene regulation and expression.  She is also interested in how pathways and networks change in different tissues and whether tissue-specific maps can be used to identify disease genes.

In particular, Dr. Troyanskaya described the “DeepSEA” software, which uses a convolution neural network (CNN) to attempt to predict the chromatin remodeling consequences of a SNP in a genomic context.   The model is a three layer CNN that takes 1000 base pairs of sequence as input.  The 1000 bp region is centered around known TF binding locations (200 bp bins).  The training data consists of a length 919 vector that contains binary values for the presence of a TF binding event across the genome (1 = binds, 0 = does not bind).  The output of the model is the probability that the specific sequence variant will affect TF binding at each of the 919 bins.

Schematic of the DeepSEA model, showing input data, training data, and output.

The DeepSEA model can be used to predict important sequence variants and even eQTLs for further study or to prioritize a list of known non-coding sequence variants.