Gene expression boxplots with ggplot2

The ubiquitous RNAseq analysis package, DESeq2, is a very useful and convenient way to conduct DE gene analyses.  However, it lacks some useful plotting tools.   For example, there is no convenience function in the library for making nice-looking boxplots from normalized gene expression data.

There are other packages one can rely on, for example ‘pcaExplorer’, but I like a simple approach sometimes to plot just a couple of genes.  So below I show you how to quickly plot your favorite gene using only ggplot2 (there is no “one weird trick” though…):

traf1_counts <- counts(ddsTxi['ENSG00000056558',], normalized = TRUE)
m <- list(counts = as.numeric(traf1_counts), group = as.factor(samples$group))
m <- as.tibble(m)
q <- ggplot(m, aes(group, counts)) + geom_boxplot() + geom_jitter(width = 0.1)
q <- q + labs(x = "Experimental Group", y = "Normalized Counts ", title = "Normalized Expression of TRAF1")
q <- q + scale_x_discrete(labels=c("00hn" = "PMN, 0hrs", "06hn" = "PMN, 6hrs",
                                   "06hy" = "PMN+Hp, 6hrs", "24hn" = "PMN, 24hrs", "24hy" = "PMN+Hp, 24hrs"))
q

As you can see above, first we must grab the normalized counts at the row corresponding with the Traf1 Ensembl ID using the ‘counts‘ function that operates on the ‘ddsTxi’ DESeqDataSet object.

In order to create a dataframe (well, a tibble to be specific) for plotting, we first create a list (‘m’) that combines the counts (as a numeric vector) and metadata group.  These two vectors will form the columns of the tibble for plotting, and we must give them names (i.e., “counts” and “group”) so the tibble conversion doesn’t complain.

The list, m, is then converted to a tibble with ‘as.tibble‘ and plotted with ggplot2, using an ‘aes(group,counts)‘ aesthetic plus a boxplot aesthetic.  The rest of the code is just modifying axis labels and tickmarks.  The final product looks like this:

Boxplot of normalized Traf1 expression in 5 different conditions (3 replicates each).

Interactive heatmap: nuclease expression in humans (GTEX data)

I worked on a project recently looking at tissue-specific nuclease expression.   I made this interactive heatmap from the enormous GTEX dataset that looks at just nuclease gene expression (in TPM) across more than 50 tissues in the human body.   It’s fun to play around with the interactive plot.   This is the way data should be presented in 2017.   I used the Plotly Python API for the chart.

Unfortunately, Plotly is now nearly $400/year if you want to use it for anything more than a few charts and there is no free option to keep sensitive research data private.  There should be an exception for academic research, but there isn’t as far as I know.