Aligning Colorspace Reads Using Bwa
2
4
Entering edit mode
13.0 years ago
Farhat ★ 2.9k

I have about 100 million reads from a SOLiD run. I am trying to align them using bwa and I got 0 alignments. What am I doing wrong here? Here are the commands that I am using

~/software/bwa-0.5.9/bwa aln -n 6 -t 6 -o 2 -c ~/genomes/hsap/hg19.fa sampleTF5.fastq.gz    
~/software/bwa-0.5.9/bwa samse ~/genomes/hsap/hg19.fa sampleTF5.sai sampleTF5.fastq.gz |samtools view -bS -|samtools sort - sampleTF5

About 40% of the reads align using Bioscope so I know that at least some reads should align. The index was created using -c so it is a colorspace index.

ETA: Couple of reads from the fastq file

@853_2_23
T10201001101112312122022330313023.22201032232203002
+
.06%8+23,-/,740&+2,&(*+&26%&%'';!%'(&)':2((,,-'%(.
@853_2_76
T00221112202322220011002232000222000212301132232001
+
&<*(%'?'&'&5)*'%%%&('-'(()-')&)&%)*'/%%&%'%(%&&'&%
bwa alignment solid • 8.2k views
ADD COMMENT
1
Entering edit mode

what do your reads look like? did you use solid2fastq.pl?

ADD REPLY
1
Entering edit mode

There are a couple of different scripts called solid2fastq.pl floating around: http://kevin-gattaca.blogspot.com/2010/05/plethora-of-solid2fastq-or-csfasta.html The bwa one double-encodes and the BFAST one doesn't, or at least that was the case a while ago.

ADD REPLY
0
Entering edit mode

Yes, I used solid2fastq.pl. The reads are 50 bp long colorspace reads. The quality statistics looked okay with FASTQC.

ADD REPLY
0
Entering edit mode

@853_2_23 T10201001101112312122022330313023.22201032232203002 + .06%8+23,-/,740&+2,&(+&26%&%'';!%'(&)':2((,,-'%(. @853_2_76 T00221112202322220011002232000222000212301132232001 + &<(%'?'&'&5)'%%%&('-'(()-')&)&%)'/%%&%'%(%&&'&%

ADD REPLY
0
Entering edit mode

@853_2_23 T10201001101112312122022330313023.22201032232203002 + .06%8+23,-/,740&+2,&(+&26%&%'';!%'(&)':2((,,-'%(. @853_2_76 T00221112202322220011002232000222000212301132232001 + &<(%'?'&'&5)'%%%&('-'(()-')&)&%)'/%%&%'%(%&&'&%

ADD REPLY
3
Entering edit mode
13.0 years ago
Farhat ★ 2.9k

I found the solution to this and Alastair's link helped. Apparently bwa needs the fastq files to be 'double encoded'. Thus, you have to rewrite the colorspace fastq with tr/0123./ACGTN/ to get bwa to work. I am adding the solution here just in case others run into this issue too.

ADD COMMENT
2
Entering edit mode
13.0 years ago

You may need the -a bwtsw option. Check out this thread on SeqAnswers for further information

EDIT: 2016

bwa does not seems to support colorspace from version 0.6 onwards. The last version I am aware of that worked was 0.5.1.

I would suggest looking at BFAST, shrimp or novoalignCS but I have not needed to use colorspace reads now for many years.

ADD COMMENT
0
Entering edit mode

@Alastair: I am trying this command to index the genome in color space: bwa index -a bwtsw -c GRCh38.r76.fa but I get the following error: index: invalid option -- 'c' . Can you guide me if this -c option is deprecated or what is wrong here. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6