single cell Archives - Michael's Bioinformatics Blog

Performance on simulated scRNA-seq data

Figure 1A. Gene-gene correlations before (left) and after imputation with five methods (right). Red bars are highly-expressed DE genes, and blue bars are lowly-expressed DE genes. Gray bar are non-DE genes in this simulated dataset.

Somewhat dramatically, DrImpute and MAGIC introduce strong false positive correlations, while SAVER only strengthens existing correlations between lowly expressed DE genes. As you can see in part B of this figure below, parameter tuning also has a dramatic effect on the false positive rate in some cases. Increasing the k-neighbors for MAGIC and KNN methods increases smoothing and also false positives. SAVER and scImpute are relatively immune to changes in FPR with parameter space.

Figure 1B. False positive gene correlation rates as a function of algorithm parameters.

You can’t have your cake and eat it, too

In this next figure, the authors look at the trade-off between sensitivity and specificity in imputation methods on simulated datasets. It shows clearly that any improvements to sensitivity of DE gene detection come at a significant cost of specificity, and vice versa.

Detection of DE genes in simulated data.

The authors go on to show that on real data, every method including SAVER generates large numbers of false positives. In summary, imputation, while potentially promising, is limited owing to the lack of an independent reference (as in the case of GWAS imputation methods) to impute from. Since single-cell imputation methods rely only on the dataset itself, one cannot escape the sensitivity/specificity tradeoff and false-positive problem.

I recently ran across a situation that I think is going to be increasingly common as I do more and more single-cell analyses. Specifically, I had a project where the investigator had several experiments in related conditions that they want to merge and evaluate with a pseudotime analysis. I could not find any useful tools within Monocle itself for merging data (please correct me in the comments if I’m missing something). It looks as if you have to import a pre-merged seurat dataset.

Here is the workaround that I found [please note these commands are for Seurat v2, they will likely *not* work in v3]:

naive.data <- Read10X(data.dir = paste0(base_dir, "naive_outs/filtered_gene_bc_matrices/mm10/"))
naive <- CreateSeuratObject(raw.data = naive.data, min.cells = 3, min.genes = 200, project = "10X_naive")

lcmv.data <- Read10X(data.dir = paste0(base_dir, "lcmv_outs/filtered_gene_bc_matrices/mm10/"))
lcmv <- CreateSeuratObject(raw.data = lcmv.data, min.cells = 3, min.genes = 200, project = "10X_lcmv")

pygp.data <- Read10X(data.dir = paste0(base_dir, "pygp_outs/filtered_gene_bc_matrices/mm10/"))
pygp <- CreateSeuratObject(raw.data = pygp.data, min.cells = 3, min.genes = 200, project = "10X_pygp")

cd4.combined <- MergeSeurat(object1 = naive, object2 = lcmv, add.cell.id1 = "naive", add.cell.id2 = "lcmv")
cd4.combined <- AddSamples(object = cd4.combined, new.data = pygp.data, add.cell.id = "pygp")

cd4.combined@raw.data <- as.matrix(cd4.combined@raw.data)
cd4.monocle <- importCDS(cd4.combined, import_all = TRUE)

Here, I am reading in 10X data using Seurat (v2) w/ the Read10X function and then creating the Seurat object with CreateSeuratObject.

Once this done I use MergeSeurat to merge the first two experiments, and then AddSamples to add in the final experiment. Then we can take advantage of the monocle function importCDS to import the combined object into monocle.

Now there is one final problem and that is that the “orig.ident” field is blank:

To recover the original identity of each cell, we can use the updated cell names from the merged Seurat dataset (i.e., “naive_AAACTGAGAAACCGA”). We just need to split these and recover which experiment each cell came from with:

cellnames <- rownames(pData(cd4.monocle))
orig_ident <- sapply(strsplit(cellnames, split = "_"), '[', 1)
pData(cd4.monocle)$orig.ident <- orig_ident

We do a strsplit on the cellnames, splitting on underscore. The first value from the split in each case is assigned back into the ‘orig.ident’ field of the cell dataset object.

Now you’re ready to continue with the normal downstream analysis in monocle. With dimensionality reduction and clustering done (not shown), we can plot the calculated clusters side-by-side with the experiment of origin (from the merged seurat dataset):

library(cowplot)
p1 <- plot_cell_clusters(cd4.monocle, color_by = 'orig.ident')
p2 <- plot_cell_clusters(cd4.monocle) #color_by cluster is default behavior
plot_grid(p1,p2)

And we get:

Tag: single cell

To impute or not to impute scRNA-seq datasets?

Performance on simulated scRNA-seq data

You can’t have your cake and eat it, too

Importing a merged Seurat dataset into Monocle

New video: 10X Loupe Cell Browser explained!