I am trying to align a set of read sequences (SNP chip array) to a reference genome. I want to get the physical positions of each SNP in each read.
My reads are in FASTA format, e.g.
>id00001Zh:Chr02:57,645,640
TGCAGACYCAGACAAGGTTTAACACAGATTGGAACCGTTA
>id00002Zh:Chr07:53,650,797
TGTAAGATTGCYGGCAATGAATGTCAGTCGAGATGAAGAC
>id00003Zh:Chr06:48,898,851
CAGTMATTTTGATCCCTCGGTTGATGTGACTTCAAGCAGTA
I am using bowtie2 software. In each read there is an ambiguous character (IUPAC code) which are the SNPs I am interested to get their positions in the reference genome. Base on what I understood from the Bowtie2 manual I was expecting to get the following parameters in the alignment section of my output SAM file:
CIGAR = 39M and alignment score = MD:Z:7Y32
CIGAR = 39M and alignment score = MD:Z:11Y28
CIGAR = 39M and alignment score = MD:Z:4M35
Instead I got the following parameters for almost all the alignments
CIGAR = 40M and alignment score = MD:Z:40
This is the code I am using:
bowtie2 -x <bt2.idx> -f snp_chip.fa -S results.sam --no-unal
Any idea what could I am doing wrong?