Variant annotation and transcript choice

 

Transcript choice between methods

Variant annotation methods do not all behave the same way when choosing transcripts to annotate against.  This leads to differing outcomes in annotations which may arise from different logic structures in the algorithms or different user criteria for annotation.

Unfortunately, incorrect annotations or disagreement in annotation outcomes can lead investigators to waste resources tracking down variants of little interest or to miss severe variants of potential clinical significance.

In this first post in a series, I’ll talk briefly about differing outcomes owing to transcript choices when three popular methods (ANNOVAR, VEP, and SNPEff) are applied to a dataset of 81 million variants from the 1000Genomes project.

In this figure you can see the lack of concordance owing to transcript choice affects a surprisingly large number of variants.

variant annotation

This disagreement is largely owing to the way that intergenic variants are handled, assigning them to nearest genes or arbitrary categories like “unknown.”

To learn more about this problem and other issues with annotators and concordance between methods, check out our recent paper at biorxiv.   In part two, I’ll talk more about concordance between methods when annotators agree on transcript choice.