Here is a handy unix one-liner to process mutect2 output VCF files into the 5 column, tab-separated format required by Oncotator for input (Oncotator is a web-based application that annotates human genomic point mutations and indels with transcripts and consequences). The output of Oncotator is a MAF-formatted file that is compatible with MutSigCV.
#!/bin/bash
FILES='*.vcf.gz'
for file in $FILES
do
zcat $file | grep -v "GL000*" | grep -v "FILTER" | grep "PASS" | cut -d$'\t' -f 1-5 | awk '$3=$2' | awk '$1="chr"$1' > $file.tsv
done
Breaking this down we have:
“zcat $file” : read to stdout each line of a gzipped file
“grep -v “GL000*” : exclude any variant that doesn’t map to a named chromosome
“grep -v “FILTER” : exclude filter header lines
“grep “PASS””: include all lines that pass mutect2 filters
“cut -d$’\t’ -f 1-5” : cut on tabs and keep fields one through five
“awk ‘$3=$2’ : set column 3 equal to column 2, i.e., start and end position are equal
“awk $1=’chr’$1″” : set column one equal to ‘chr’ plus column one (make 1 = chr1)