BWA alignment using paired end reads from different lanes
3
1
Entering edit mode
9.1 years ago
deepue ▴ 160

Hi,

I would need some help on to proceed for WES analysis.

I observed from different posts in the forum that the paired end reads can be aligned separately and the resulting SAM files are merged for further analysis. I would like to know on how to consider the data from different lanes while doing alignment.

Should I merge the data from different lanes for L{1,2}_R1.fq, L{1,2}_R2.fq into separate files or perform alignment for different read type(1,2) and different lanes specific data separately(4 times L1_R1.fq, L1_R2, L2_R1, L2_R2) and merge the SAM files for further analysis?

Please advise.

Thanks

alignment WES bwa • 9.3k views
ADD COMMENT
0
Entering edit mode

it's better to merge fastq and then perform alignment

ADD REPLY
0
Entering edit mode

You mean, merge data from different lanes L{1,2} of Read1.fq into a single file, L{1,2} of Read2.fq into another file right?

Thanks

ADD REPLY
1
Entering edit mode

Exactly

ADD REPLY
0
Entering edit mode

Hello deepue!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=51448

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Thanks Pierre for your advise. I am trying to delete the other post and will take care of it from next time.

Could you please advise in this scenario ? Thank you !

ADD REPLY
2
Entering edit mode
9.1 years ago

I do not agree with arno.guille, it's faster to map+sort (L1_R1.fq+L1_R2.fq) and (L2_R1.fq+L2_R2.fq) in parallel and then merge the sorted Bam files.

ADD COMMENT
0
Entering edit mode

Thank you Pierre.

I will do the BWA alignment 4 times for each Lane{1,2}_Read{1,2}.fq separately and then generate 2 .sam files for 2 different lanes. Sort the 2 sam files(Lane1, Lane2) separately and merge them into a single sam file. Please advise, if i am correct.

Sorry, I couldn't understand the parallel part. Could you please help me.

Thanks

ADD REPLY
2
Entering edit mode

parallel= processing two commands at the same time: see http://en.wikipedia.org/wiki/Parallel_computing, gnu-parallel and/or Makefile option -j

yes, you can merge both sam/bam file with picard MergeSamFile.

ADD REPLY
0
Entering edit mode

If you also need to do MarkDuplicates you can skip the merge and give multipe BAM files to that module.

ADD REPLY
0
Entering edit mode

You're right it's faster, but in term of memory usage I'm not sure it is optimized. Of course it depends of your hardware specs. In my case, if I run 4 alignments on the same node I will get a nice segmentation fault.

ADD REPLY
0
Entering edit mode

Hi

I have run 4 alignments one by one and they all completed. I am not sure whether it is successfully completed or not and the query is still open in the link. My second issue is, I have performed the 'bwa sampe' on the data from Lane1 which is completed and the data from Lane2 is aborted with Segmentation fault. As you have mentioned about this error in the above comment. Could you please provide more information on this to handle this error?

Thanks

ADD REPLY
0
Entering edit mode

You probably get a segmentation fault due to lack of memory. Let's say you have 16 Gb of memory and 4 CPU. Each alignment process takes 6Gb in memory. If you run 4 alignments in the same time, you will need 4*6=24Gb. Unfortunately i have no solution, except to increase the memory space or run your alignments one by one. But in the latter case, you should probably merge the fastq.

ADD REPLY
2
Entering edit mode
8.4 years ago
abascalfederico ★ 1.2k

I disagree with some of the answers: you should not merge the different FASTQs from different lanes before aligning them. If you do so, then you will loose "read group" (RG tag) information. I would align each lane separately, then add specific RG tags to each lane and then merge the aligned bams (respecting RG information). RG information is important for downstream analyses.

ADD COMMENT
1
Entering edit mode
8.4 years ago
Chris Cole ▴ 800

You don't have to use GATK for WES analysis, but you should at least read and understand the best practice workflow.

At the very least you should align each lane separately, remove duplicates and then merge. As abascalfederico correctly says you must use the RG (read group) tags to keep track of lane information.

ADD COMMENT

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6