Gene expression boxplots with ggplot2

The ubiquitous RNAseq analysis package, DESeq2, is a very useful and convenient way to conduct DE gene analyses.  However, it lacks some useful plotting tools.   For example, there is no convenience function in the library for making nice-looking boxplots from normalized gene expression data.

There are other packages one can rely on, for example ‘pcaExplorer’, but I like a simple approach sometimes to plot just a couple of genes.  So below I show you how to quickly plot your favorite gene using only ggplot2 (there is no “one weird trick” though…):

traf1_counts <- counts(ddsTxi['ENSG00000056558',], normalized = TRUE)
m <- list(counts = as.numeric(traf1_counts), group = as.factor(samples$group))
m <- as.tibble(m)
q <- ggplot(m, aes(group, counts)) + geom_boxplot() + geom_jitter(width = 0.1)
q <- q + labs(x = "Experimental Group", y = "Normalized Counts ", title = "Normalized Expression of TRAF1")
q <- q + scale_x_discrete(labels=c("00hn" = "PMN, 0hrs", "06hn" = "PMN, 6hrs",
                                   "06hy" = "PMN+Hp, 6hrs", "24hn" = "PMN, 24hrs", "24hy" = "PMN+Hp, 24hrs"))
q

As you can see above, first we must grab the normalized counts at the row corresponding with the Traf1 Ensembl ID using the ‘counts‘ function that operates on the ‘ddsTxi’ DESeqDataSet object.

In order to create a dataframe (well, a tibble to be specific) for plotting, we first create a list (‘m’) that combines the counts (as a numeric vector) and metadata group.  These two vectors will form the columns of the tibble for plotting, and we must give them names (i.e., “counts” and “group”) so the tibble conversion doesn’t complain.

The list, m, is then converted to a tibble with ‘as.tibble‘ and plotted with ggplot2, using an ‘aes(group,counts)‘ aesthetic plus a boxplot aesthetic.  The rest of the code is just modifying axis labels and tickmarks.  The final product looks like this:

Boxplot of normalized Traf1 expression in 5 different conditions (3 replicates each).