Processing Metagenomics Reads from Multiple Lanes
0
2
Entering edit mode
7.9 years ago
adityabandla ▴ 30

Hi,

I have been using a generic pipeline for processing a recent metagenomics dataset that I received.

Samples were sequenced on two lanes on the HiSeq. Thus, I have 4 sets of reads per sample i.e. Reads 1 and 2 from Lane 1 and Reads 1 and 2 from lane 2

My workflow has been the following 1. Adapter and Quality Trimming all four files per sample 2. Concatenate all four files into a single file 3. Run diamond on this single file 4. Then use Megan6 for further processing

Can anyone please advise if Step 2 is appropriate in my case or is it better to process each lane separately?

Metagenomics HiSeq DIAMOND • 2.1k views
ADD COMMENT
1
Entering edit mode

It is appropriate to concatenate respective R1 and R2 files from the two lanes if the same sample ran in both lanes. It would not be appropriate to cat all 4 files together into one (unless diamond (I am not familiar with it) requires it).

ADD REPLY
1
Entering edit mode

As the alignment is anyway done for each sequence separately, how does it matter if the fastq/fasta files are combined or alternatively broken into smaller pieces?

ADD REPLY
1
Entering edit mode

If you are treating them as single end data then it is fine. Was there a reason to do PE sequencing then?

Even if you just used the R1 data you are going to get the same exact answer (if you used R1 and R2 reads as separate queries) since the same fragment is sampled by the two reads. (Diamond is fast "blastx" alternative).

ADD REPLY

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6