Create a volcano plot on EBSeq output

The differential expression analysis program EBSeq produces a number of data objects as part of the workflow, but there aren’t many options for visualization of the data.

The authors suggest the use of heatmap.2 in R:

heatmap.2(NormalizedMatrix[GenesOfInterest,], scale=”row”, trace=”none”, Colv=F)

However, this depends on knowing ahead of time your genes of interest.  It is not practical to generate a heatmap with hundreds or thousands of DE genes.

I wanted to produce something approximating a volcano plot for EBSeq results.  What I came up with initially was the following:

A pseudo-volcano plot for EBSeq results.  The y-axis is posterior probability of differential expression (PPDE).
A pseudo-volcano plot for EBSeq results. The y-axis is posterior probability of differential expression (PPDE).

To make this plot, I had to grab some data arrays from the large object “EBOut” that is produced by calling the “EBTest” method and from the “GeneFC” object as below:

plot(GeneFC$PostFC, EBOut$PPDE, 
xlim =c(0,5), ylim=c(0,1), 
main="Control/Experimental FC vs. PPDE",
sub=GeneFC$Direction, xlab="EBSeq Posterior Fold Change", ylab="EBSeq posterior prob of DE")

abline(h=0.95)

The “abline” command places a horizontal line where the PPDE is equal to or greater than 95%.  This would be equivalent to an FDR of 0.05.

If you want to inspect the plot interactively in R to identify gene names above the threshold and/or with large posterior fold changes you would use:

identify(GeneFC$PostFC, EBOut$PPDE, labels=names(GeneFC$PostFC))

To make it look more like a canonical volcano plot, I then tried:

plot(log2(GeneFC$RealFC), EBOut$PPDE, 
xlim =c(-5,5), 
ylim=c(0,1), 
main="Log2FoldChange vs. PPDE", 
xlab="EBSeq Log2 Fold Change", ylab="EBSeq PPDE")

Creating the following plot:

Fig 2.  Log(2) Fold Change on the X-axis.
Fig 2. Log(2) Fold Change on the X-axis.

This is good, except I want to subset the data and add colors. To do this I need to create a new dataframe from the EBOut$PPDE and GeneFC$PostFC objects:

volc_df = data.frame(EBOut$PPDE, GeneFC$PostFC)

With everything in one dataframe, plotting and subsetting the data is easier. Inspired by this post at Stephen Turner’s “Getting Genetics Done” blog, I prepared my final colored volcano plot as follows:

with(volc_df, plot(log2(PostFC), PPDE, pch=20, main="Volcano Plot EBSeq", xlim=c(-5,5)))

abline(h=0.95)

with(subset(volc_df, PPDE > 0.95 & abs(log2(PostFC)) < 2), points(log2(PostFC), PPDE, pch=20, col="orange")) 

with(subset(volc_df, PPDE > 0.95 & abs(log2(PostFC)) < 2), points(log2(PostFC), PPDE, pch=20, col="red"))

The final plot looks like this:

Fig 3.  The final plot, with colored points for the DE genes, and higher fold change indicated with red.
Fig 3. The final plot, with colored points for the DE genes, and higher fold change indicated with red.

One thought on “Create a volcano plot on EBSeq output”

  1. Thank you for the visualization, it has been very useful and even if you are new into R its easy to understand.
    In the final volcano plot it has ‘&lt’ for both the orange and red color, so they overlap.

Leave a Reply

Your email address will not be published.