I recently had to inspect some genomic alignments as part of a project. Usually, I am just working with BAM files and if inspection is needed, I just visualize the pileups to see what is going on.
In this case, I just wanted a quick answer to how the reads were aligning to the reference, and I didn’t want to go through the process of subsetting and copying the BAM files to my local machine.
The SAM file is the uncompressed record of the read alignments produced by an aligner method (STAR, TopHat, BWA, etc….). This file can get very large, and so is usually compressed into BAM (faster for machine parsing, but not human readable) and the SAM file is discarded.
In my case, I still had the SAM files around to inspect. If you find yourself needing to read a SAM file, here are three helpful reference tools to make the process less painful:
1) This page has an enormous amount of detail about SAM files including this helpful chart that enumerates all of the fields that you can expect to find specified within each alignment:
2) This post from the blog “zenfractal.com” contains a great exposition on CIGAR strings and how to decode them:
3) And finally, if you’re trying to decode the SAM bitwise flags, you can calculate them using this tool from the Broad Institute:
With the growing popularity of single-cell RNA-Seq analysis, the t-SNE projection of multi-dimensional data is appearing more often in publications and online. If you’ve ever wanted to develop a better intuitive feel for what exactly t-SNE does and where it can go wrong, this interactive tutorial (by Martin Wattenberg and Fernanda Viegas) is extremely compelling and useful.
In addition to providing a wonderful, interactive plotting function, the authors go on to provide an informative tutorial explains the pitfalls and challenges of the optimization and hyper-parameter tuning of t-SNE projections and how to get the most from the plots. Here is an example:
In the example above, tuning the “perplexity” of the t-SNE projection causes the correct reconstruction of the data when values are between 30-50, but the same method fails when the parameter falls outside those ranges (i.e., too small or too large).
Go check out this distill.pub site. It’s worth your time.