How to read fastqs from different sequencing runs rather than merging?
0
0
Entering edit mode
2.0 years ago
Vasu ▴ 770

I have fastqs of samples from the first sequencing and second sequencing runs and they are kept in different directories like below:

First Run:

Data1
    |_____fastq_folder
               |_______ sample1
                           |______ sample1_L1_R1.fastq.gz
                           |______ sample1_L1_R2.fastq.gz
                           |______ sample1_L2_R1.fastq.gz
                           |______ sample1_L2_R2.fastq.gz
               |_______ sample2
                           |______ sample2_L1_R1.fastq.gz
                           |______ sample2_L1_R2.fastq.gz
                           |______ sample2_L2_R1.fastq.gz
                           |______ sample2_L2_R2.fastq.gz

Second Run:

Data2
    |_____fastq_folder
               |_______ sample1
                           |______ sample1_L1_R1.fastq.gz
                           |______ sample1_L1_R2.fastq.gz
               |_______ sample2
                           |______ sample2_L1_R1.fastq.gz
                           |______ sample2_L1_R2.fastq.gz

Usually, when I want to run Salmon or Kallisto on First Run files which are in the directory Data1 in my script I give it like the below:

Let's say I'm inside directory Data1 where I have a script named kallisto.sh. Inside the script, I have it like below to read the fastq files.

r1=$(ls $fastq_folder/$sample/$sample*_R1.fastq.gz)
r2=$(ls $fastq_folder/$sample/$sample*_R2.fastq.gz)

But now I would like to also use Second Run files also in my script. How to make the change for r1 and r2 to read all the files in First Run and also Second Run?

P.S: I know there is a way to merge and then perform the analysis, but it might take huge time at my workplace.

ngs rnaseq fastq • 754 views
ADD COMMENT
1
Entering edit mode

in any case you should need to merge the data from Data1, they are run on different lanes but represent the same biological sample.

so something like cat sample1_L1_R1.fastq.gz sample1_L2_R1.fastq.gz > sample1_R1.fastq.gz (== join the data from different lanes in to one file per biological sample)

ADD REPLY
0
Entering edit mode

Yes, I know this. Please check the last line of my post. It might take a huge time at my workplace for merging, so I'm looking for alternative way.

ADD REPLY
0
Entering edit mode

But now I would like to also use Second Run files also in my script.

You can use find command with a certain depth like here: How to concatenate multiple fastq files (located in different directories) for each sample Is sample1 naming consistent across folders and files?

I know there is a way to merge and then perform the analysis, but it might take huge time at my workplace.

Why would this take huge time? It will take up space since you will duplicate the data for some time.

ADD REPLY
0
Entering edit mode

Not sure if this is what you're asking, but if the runs represent the same biological sample, you can just put them one right after another in kallisto:

kallisto quant -i index.idx -o output/ run1.r1.fq.gz run1.f2.fq.gz run2.r1.fq.gz run2.r2.fq.gz run3.r1.fq.gz run3.r2.fq.gz

ADD REPLY

Login before adding your answer.

Traffic: 1560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6