data analysis Archives - Page 4 of 11 - Michael's Bioinformatics Blog

December 12, 2017

Interactive heatmap: nuclease expression in humans (GTEX data)

I worked on a project recently looking at tissue-specific nuclease expression. I made this interactive heatmap from the enormous GTEX dataset that looks at just nuclease gene expression (in TPM) across more than 50 tissues in the human body. It’s fun to play around with the interactive plot. This is the way data should be presented in 2017. I used the Plotly Python API for the chart.

Unfortunately, Plotly is now nearly $400/year if you want to use it for anything more than a few charts and there is no free option to keep sensitive research data private. There should be an exception for academic research, but there isn’t as far as I know.

December 11, 2017

New video: Introduction to RNA-Seq for researchers

November 17, 2017

Are deep neural nets “Software 2.0”?

Image from: https://cdn.edureka.co/blog/wp-content/uploads/2017/05/Deep-Neural-Network-What-is-Deep-Learning-Edureka.png

Recent blog posts by Andrej Karpathy at Medium.com and Pete Warden at PeteWarden.com have caused a paradigm shift in the way I think about neural nets. Instead of thinking of them as powerful machine learning tools, the authors instead suggest that we should think of neural nets, and in particular, convolution deep nets, as ‘self-writing programs.’ Hence the term, “Software 2.0.”

It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data than to explicitly write the program. A large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks. — Andrej Karpathy, Medium.com

I found this to be a dramatic reversal in my thinking about these techniques, but it opens up a deeper understanding and is much more intuitive. The fact is that combinations of artificial neurons can be used to model any logical operation. Therefore you can conceptualize training a neural net as searching programming space for an optimal program that behaves in the way you specify. You provide the inputs and desired outputs, and the model searches for the optimal program.

This stands in contrast to the “Software 1.0” paradigm where the programmer uses her skill and experience to conceptualize the right combination of specific instructions to produce the desired behavior. While it seems certain that Software 1.0 and 2.0 will co-exist for a long time, this new way of understanding deep learning is crucial and exciting, in my opinion.

September 20, 2017September 20, 2017