map both interleaved and orphaned reads to assembly for each sample (with bowtie2)
1
0
Entering edit mode
6.0 years ago
willnotburn ▴ 50

I have many samples, each with reads in two files: ${interleaved}.fastq and ${orphaned}.fastq. The assembly, used here as reference, was constructed with megahit using both sets.

Correct me if I'm wrong, but to get accurate coverage statistics I need to, for each sample, map reads from the interleaved and the orphaned file onto the assembly. Can bowtie2 or some other tool do that? I was thinking a one-line command for each sample, like below (no for loop here, my shell script skills are modest)

bowtie2 -p 20 -x final.contigs.build.fa --interleaved ${interleaved}.fastq -U ${orphaned}.fastq -S sample_x.sam 2> sample_x.out

Would this work? I haven't tested this on very many samples, but for a couple of them the size of sample_x.sam output file appears the same, with or without the -U argument in the command, if the --interleaved argument is left in place.

I was once told to map interleaved and orphaned files in two separate steps and then combine output .sam files using samtools. If this is the better option, which samtools command(s) would accomplish that?

Either way, the goal is the same: a .sam and later a sorted .bam file for each sample.

Assembly alignment metagenomics • 2.2k views
ADD COMMENT
1
Entering edit mode

You can use bbwrap.sh from BBMap suite to map paired-end and singletons together like this.

bbwrap.sh in1=read1.fq,singletons.fq in2=read2.fq,null out=mapped.sam append

reformat.sh can de-interleave your reads easily. reformat.sh in=interleaved.fq.gz out1=Read1.fg.gz out2=Read2.fq.gz.

ADD REPLY
0
Entering edit mode

Thanks, @genomax!! Your answer works and may be the most straightforward option.

I am still holding out for a bowtie2 solution, perhaps mediated by samtools. The only reason is my amateur understanding of bbmap performance (large false positive rate).

ADD REPLY
0
Entering edit mode

Thanks for link. I will let @Brian (author of BBMap) know about it.

Couple of things. That blog link is from 2015 and BBMap is continuously evolving. It is very flexible (as they author of that blog admitted). If your data is not metagenomic then you don't need to worry.

ADD REPLY
0
Entering edit mode

Does orphaned refer to reads that were not used in the assembly or just those reads whose partner missing (from a pair)?

ADD REPLY
0
Entering edit mode

orphaned are no-partner reads. megahit can take interleaved and orphaned input to produce one assembly (at least it didn't give me an error), like this

x=$(ls *interleaved* -1 | paste -sd "," -)
y=$(ls *orphaned* -1 | paste -sd "," -)

megahit -m 1 --12 "$x" -r "$y" -o megahit_done
ADD REPLY
0
Entering edit mode

Ok. So they were part of the assembly. Just wanted to make sure.

ADD REPLY
0
Entering edit mode
6.0 years ago
willnotburn ▴ 50

Thanks to @genomax and some trial and error, I figured this one out.

bowtie2 can absolutely map both paired and unpaired (orphaned) reads at the same time. The key is to supply paired reads as separate forward and reverse inputs, as opposed to an interleaved file.

This works: bowtie2 -x final.contigs.build.fa 1 forward.fastq 2 reverse.fastq -U orphaned.fastq -S sample_x.sam 2> sample_x.out

ADD COMMENT

Login before adding your answer.

Traffic: 1470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6