The differential expression analysis program EBSeq produces a number of data objects as part of the workflow, but there aren’t many options for visualization of the data.
The authors suggest the use of heatmap.2 in R:
However, this depends on knowing ahead of time your genes of interest. It is not practical to generate a heatmap with hundreds or thousands of DE genes.
I wanted to produce something approximating a volcano plot for EBSeq results. What I came up with initially was the following:
To make this plot, I had to grab some data arrays from the large object “EBOut” that is produced by calling the “EBTest” method and from the “GeneFC” object as below:
plot(GeneFC$PostFC, EBOut$PPDE, xlim =c(0,5), ylim=c(0,1), main="Control/Experimental FC vs. PPDE", sub=GeneFC$Direction, xlab="EBSeq Posterior Fold Change", ylab="EBSeq posterior prob of DE") abline(h=0.95)
The “abline” command places a horizontal line where the PPDE is equal to or greater than 95%. This would be equivalent to an FDR of 0.05.
If you want to inspect the plot interactively in R to identify gene names above the threshold and/or with large posterior fold changes you would use:
identify(GeneFC$PostFC, EBOut$PPDE, labels=names(GeneFC$PostFC))
To make it look more like a canonical volcano plot, I then tried:
plot(log2(GeneFC$RealFC), EBOut$PPDE, xlim =c(-5,5), ylim=c(0,1), main="Log2FoldChange vs. PPDE", xlab="EBSeq Log2 Fold Change", ylab="EBSeq PPDE")
Creating the following plot:
This is good, except I want to subset the data and add colors. To do this I need to create a new dataframe from the EBOut$PPDE and GeneFC$PostFC objects:
volc_df = data.frame(EBOut$PPDE, GeneFC$PostFC)
With everything in one dataframe, plotting and subsetting the data is easier. Inspired by this post at Stephen Turner’s “Getting Genetics Done” blog, I prepared my final colored volcano plot as follows:
with(volc_df, plot(log2(PostFC), PPDE, pch=20, main="Volcano Plot EBSeq", xlim=c(-5,5))) abline(h=0.95) with(subset(volc_df, PPDE > 0.95 & abs(log2(PostFC)) < 2), points(log2(PostFC), PPDE, pch=20, col="orange")) with(subset(volc_df, PPDE > 0.95 & abs(log2(PostFC)) < 2), points(log2(PostFC), PPDE, pch=20, col="red"))
The final plot looks like this: