The easiest way to compare the differences in SNPs found by the two mappers is to do a comparison of the locations at which a SNP has been found between the files.

We will first extract the SNP locations from our bowtie vcf file:

grep -v "^#" bowtie_snps.vcf | cut -f 1,2 | tr "\t" ":" | sort > bowtie_snps.txt

Explanation of parameters:

grep –v “^#”  Extract lines which don’t start with a
cut –f 1,2  Extract the first two columns of data
tr “\t” “:”  Replace tabs with a colon
sort Sort the SNP positions
> bowtie_snps.txt     Output to file

We then repeat this process on the bwa vcf file:

grep -v "^#" bwa_snps.vcf | cut -f 1,2 | tr "\t" ":" | sort > bwa_snps.txt

 You will now have two text files which looking similar to this:

  head -n 4 bwa_snps.txt bowtie_snps.txt

==> bwa_snps.txt <==
==> bowtie_snps.txt <==

The final stage in the process is to use sdiff to undertake a side by side comparison of our two position files:

sdiff bowtie_snps.txt bwa_snps.txt


You will see that variant discovery on data mapped using bwa has identified SNPs at all of the locations that analysis on data mapped by bowtie, but has identified a number of additional SNPs.  You can investigate these positions and the reads mapped to them further using the samtools tview tool.