1. Home
  2. Docs
  3. Introduction to NGS analysis
  4. Comparisons between workflows

Comparisons between workflows

The easiest way to compare the differences in SNPs found by the two mappers is to do a comparison of the locations at which a SNP has been found between the files.

We will first extract the SNP locations from our bowtie vcf file:

grep -v "^#" bowtie_snps.vcf | cut -f 1,2 | tr "\t" ":" | sort > bowtie_snps.txt

Explanation of parameters:

grep –v “^#”  Extract lines which don’t start with a
cut –f 1,2  Extract the first two columns of data
tr “\t” “:”  Replace tabs with a colon
sort Sort the SNP positions
> bowtie_snps.txt     Output to file

We then repeat this process on the bwa vcf file:

grep -v "^#" bwa_snps.vcf | cut -f 1,2 | tr "\t" ":" | sort > bwa_snps.txt

 You will now have two text files which looking similar to this:

  head -n 4 bwa_snps.txt bowtie_snps.txt

==> bwa_snps.txt <==
chr17:41200036
chr17:41201130
chr17:41201198
chr17:41209153
==> bowtie_snps.txt <==
chr17:41201130
chr17:41201198
chr17:41209153
chr17:41215396

The final stage in the process is to use sdiff to undertake a side by side comparison of our two position files:

sdiff bowtie_snps.txt bwa_snps.txt

 

You will see that variant discovery on data mapped using bwa has identified SNPs at all of the locations that analysis on data mapped by bowtie, but has identified a number of additional SNPs.  You can investigate these positions and the reads mapped to them further using the samtools tview tool.