ambiguous characters are not reported in Bowtie2 alignments
0
0
Entering edit mode
4.9 years ago
Famf ▴ 30

I am trying to align a set of read sequences (SNP chip array) to a reference genome. I want to get the physical positions of each SNP in each read.

My reads are in FASTA format, e.g.

>id00001Zh:Chr02:57,645,640
TGCAGACYCAGACAAGGTTTAACACAGATTGGAACCGTTA
>id00002Zh:Chr07:53,650,797
TGTAAGATTGCYGGCAATGAATGTCAGTCGAGATGAAGAC
>id00003Zh:Chr06:48,898,851
CAGTMATTTTGATCCCTCGGTTGATGTGACTTCAAGCAGTA

I am using bowtie2 software. In each read there is an ambiguous character (IUPAC code) which are the SNPs I am interested to get their positions in the reference genome. Base on what I understood from the Bowtie2 manual I was expecting to get the following parameters in the alignment section of my output SAM file:

CIGAR = 39M and alignment score = MD:Z:7Y32
CIGAR = 39M and alignment score = MD:Z:11Y28
CIGAR = 39M and alignment score = MD:Z:4M35

Instead I got the following parameters for almost all the alignments

CIGAR = 40M and alignment score = MD:Z:40

This is the code I am using:

bowtie2 -x <bt2.idx> -f snp_chip.fa -S results.sam --no-unal

Any idea what could I am doing wrong?

alignment • 974 views
ADD COMMENT

Login before adding your answer.

Traffic: 2136 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6