Biostar Beta. Not for public use.
STAR align multiple files
2
Entering edit mode
10 months ago
ta_awwad • 210
Frankfurt am Main

Hi everybody, I am doing alignment to 36 PE samples using star. to make it little bit easy task I wrote a bash loop to align them all with the same command. here is my loop:

for i in $(ls raw_data); do STAR --genomeDir index.150 \
--readFilesIn raw_data/$i\_1.fq.gz,raw_data/$i\_2.fq.gz \
--runThreadN 20 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--sjdbGTFfile GRCm38.90.gtf \
--readFilesCommand zcat ; done

but it seems that something wrong as the alignment took overnight and it was not done yet.

any recommendation

thanks much

ADD COMMENTlink
4
Entering edit mode

For 36 samples, you could speed up by loading the index into memory, and unloading when finished mapping:

STAR --genomeLoad LoadAndExit --genomeDir index.150

for i in $(ls raw_data | sed s/_[12].fq.gz// | sort -u)
do
    STAR [...]
done

STAR --genomeLoad Remove --genomeDir index.150
ADD REPLYlink
0
Entering edit mode

Thank you all for these price less info..

ADD REPLYlink
0
Entering edit mode

Hi h.mon,

Could you tell me what is the purpose of index.150 here? Can we just type the location of the genome after --genomeDir?

ADD REPLYlink
1
Entering edit mode

Yes. In the example given index.150 is the name of the index that was in the original question. Replace that with yours.

ADD REPLYlink
0
Entering edit mode

If you load the genome before the for loop using: STAR --genomeLoad LoadAndExit --genomeDir genomeDIR Do you still need to specify the --genomeDir parameter in the loop? I tried leaving that out, and STAR failed to run. Then I tried specifying the genome directory in the loop (even though the genome is loaded before the FOR loop), and it looks like each iteration of the loop is still loading the genome.

Can someone explain how to properly load the genome for multiple samples so that the loop is not iteratively loading it, please?

ADD REPLYlink
0
Entering edit mode

First you load the genome using --genomeDir $GENOMEDIR --genomeLoad LoadAndExit. For your alignment(s) you need --genomeDir $GENOMEDIR --genomeLoad LoadAndKeep

ADD REPLYlink
2
Entering edit mode

When looping, test if your code is valid by adding an echo statement to see what the command is going to be:

for i in $(ls raw_data); do echo STAR --genomeDir index.150 \
--readFilesIn raw_data/$i\_1.fq.gz,raw_data/$i\_2.fq.gz \
--runThreadN 20 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--sjdbGTFfile GRCm38.90.gtf \
--readFilesCommand zcat ; done

My guess is that the files raw_data/$i_1.fq.gz don't exist because you create $i simply based on the content of raw_data

ADD REPLYlink
0
Entering edit mode

thanks much WouterDeCoster for your reply. I run your code and got this:

STAR --genomeDir /index.150 --readFilesIn raw_data/KO_day3_1_1.fq.gz_1.fq.gz raw_data/KO_day3_1_1.fq.gz_2.fq.gz --runThreadN 20 --outFileNamePrefix aligned/KO_day3_1_1.fq.gz. --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --sjdbGTFfile GRCm38.90.gtf --readFilesCommand zcat

you are right. the file name became different.

any suggestion to correct this??

thanks much

ADD REPLYlink
1
Entering edit mode

Can you show a few examples of filenames of the fq.gz files?

ADD REPLYlink
0
Entering edit mode
KO_day3_1_1.fq.gz           KO_day4_1_2.fq.gz   mESC_KO_3_1.fq.gz  mESC_KO_3_2.fq.gz      mESC_Wt3_1.fq.gz    mESC_Wt3_2.fq.gz        PG_4WT10_07_17_1.fq.gz    PG_4WT10_07_17_2.fq.gz  PG_7Swht16_07_17_1.fq.gz  PG_7Swht16_07_17_2.fq.gz
ADD REPLYlink
2
Entering edit mode

You could try something like:

for i in $(ls raw_data | sed s/_[12].fq.gz// | sort -u); do echo STAR --genomeDir index.150 \
--readFilesIn raw_data/${i}_1.fq.gz,raw_data/${i}_2.fq.gz \
--runThreadN 20 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--sjdbGTFfile GRCm38.90.gtf \
--readFilesCommand zcat ; done

I modified the $i to be shorter, and only keep unique hits since all samples will be in there twice.

ADD REPLYlink
0
Entering edit mode

Thanks much ... it is running now .. but I am not sure how much time it will take .. I will inform you if everything run fine

ADD REPLYlink
0
Entering edit mode

it looks like it is stuck .. no progress since 30 minutes .. is it normal???

ADD REPLYlink
0
Entering edit mode

You can have a look with (h)top to see if it's still working. Also, check if it's producing output files.

ADD REPLYlink
0
Entering edit mode

I think the problem was that STAR doesn't accept compressed files.

ADD REPLYlink
1
Entering edit mode

it accepts but you need to specify : --readFilesCommand zcat

ADD REPLYlink
0
Entering edit mode

I did .. and it did not work

ADD REPLYlink
0
Entering edit mode

Works just fine for me, use it all the time.

ADD REPLYlink
0
Entering edit mode

"it did not work" doesn't help us know what went wrong, what is the error message? STAR does accept gz compressed files.

ADD REPLYlink
0
Entering edit mode

just stuck no error message no progress

ADD REPLYlink
2
Entering edit mode

Try gunzip instead. It works with that.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1