1. Home
  2. Docs
  3. Introduction to NGS analysis
  4. Mapping reads
  5. Mapping reads using bowtie

Mapping reads using bowtie

Before we can map our reads, we need to prepare a reference genome.  We are going to make a copy of the hg19 chromosome 17 sequence from UCSC and then index it for use by bowtie.

Make a copy of chr17 in fasta format

cd ~/ngs

cp /scratch/share_ngs/intro_ngs/chr17.fa .

Index the genome for bowtie and name it hg19_chr17

The indexing process takes the text formatted reference sequence and undertakes a number of processing and sorting steps which produces a number of binary output files which are used by the mapper to make the search process faster than working with the raw text sequence.

In some of these analyses, we have added the command time before the analysis command to give an idea of the expected run time for a step.

module load bowtie

time bowtie-build chr17.fa hg19_chr17

This will take around 2 minutes and will generate the following new files:


Map reads against a reference

There are many, many options that can be set/tweaked to change the way reads are mapped and results handled, more information can be found at:


In our example we are going to use mostly the default options save for returning our alignments in SAM format.

bowtie -S hg19_chr17 -1 Brca1Reads_1.1.fastq -2 Brca1Reads_1.2.fastq bowtie.sam

Explanation of parameters:

-S Print the alignments in sam format
hg19_chr17 Bowtie index to align reads against
-1 Brca1Reads_1.1.fastq fastq file of read 1 reads
-2 Brca1Reads_1.2.fastq fastq file of read 2 reads
bowtie.sam Output file


# reads processed: 146687

# reads with at least one reported alignment: 93374 (63.66%)

# reads that failed to align: 53313 (36.34%)

Reported 46687 paired-end alignments to 1 output stream(s)

Files created:


Reassign MAPQ values

Bowtie assigned the MAPQ value as either 0 or 255 to indicate either no match or match for each read.  This is different to many other alignment programs and causes problems in further downstream analyses, which incorporate the MAPQ into their analysis, and interprets the 255 values as untrusted.  To overcome this, we must reassign the MAPQ value in our file to a more sensible value which is easily achieved using sed.


sed 's/\t255\t/\t60\t/g' bowtie.sam > bowtie_mapq60.sam

Explanation of parameters:

s Subsitute
\t255\t   Old value (tab 255 tab)
\t60\t  New value (tab 60 tab)
g Global replacement – every occurrence found


Files created:


 For efficiency, we now convert our SAM file into BAM format using samtools, then sort and index it.

 Convert SAM to BAM

module load samtools

 samtools view -bS bowtie_mapq60.sam > bowtie.bam

Explanation of parameters:

view Use the samtools viewer program
-b Output in BAM format
-S Input in SAM format
> bowtie.bam Pipe the output to file


Files created:


Sort the BAM file

samtools sort bowtie.bam bowtie.sorted

Explanation of parameters:

sort Uses the samtools sort program
bowtie.bam Input file
bowtie.sorted Outut file (.bam appended automatically).

Files created:


Index the BAM file

Indexing the bam file is used to speed up sorts and is essential for other programs (e.g. the Integrated Genome Viewer IGV) to work with the bam files.

samtools index bowtie.sorted.bam

Explanation of parameters:

sort Use the samtools index program
bowtie.sorted.bam Input file

Files created:


Check / view the mapping results

To check that our mapping has worked, we can use the text view in samtools to have a quick look and check of our alignments.

samtools tview bowtie.sorted.bam

To move to the BRCA gene, press


And then enter



Using a slightly different command we can define the reference genone and simplify the view to only show bases that differ when compared to the reference.

samtools tview bowtie.sorted.bam chr17.fa