As an example we are going to use a set of simulated reads which have been designed to be representative of Illumina reads that should align to the BRCA1 gene.
More details on the source of this dataset can be found in the associated paper:
http://f1000research.com/articles/1-2/v1
And details of the dataset found at:
http://figshare.com/articles/Simulated_Illumina_BRCA1_reads_in_FASTQ_format/92338
You will note that these are actually paired end reads, but for the purposes of this example, we will treat them as the results of a fragment run. Once you’ve worked though the example you may wish to work out how to repeat your analysis treating the data as paired end reads.
Copy reads to your home directory
mkdir ~/ngs
cd ~/ngs
cp /scratch/share_ngs/intro_ngs/Brca1Reads_1.1.fastq .
cp /scratch/share_ngs/intro_ngs/Brca1Reads_1.2.fastq .
cp /scratch/share_ngs/intro_ngs/chr17.fa .
Check the number of reads
grep @chr Brca1Reads_1.1.fastq | wc -l
100000
grep @chr Brca1Reads_1.2.fastq | wc -l
100000
Alternatively, we can use the –c flag in grep to return a count.
grep -c @chr Brca1Reads_1.1.fastq
100000
grep -c @chr Brca1Reads_1.2.fastq
100000