Biostar Beta. Not for public use.
Using STAR mapping multiple files get loop issue
0
Entering edit mode
12 months ago

Hello guys, Recently, I using STAR to map reads with multiple files ,here is the script:

 for NAME in individual1 individual2  individual3
do
     STAR --runMode alignReads \
              --runThreadN 10 \
              --genomeDir $REF \
              --readFilesIn ${INPUT}/${NAME}_input/${NAME}_input_R1.fq.gz \
              --readFilesCommand zcat \
              --outSAMstrandField intronMotif  \
              --outFileNamePrefix ${OUT}/${NAME}_input_wasp \
              --outSAMtype BAM Unsorted \
              --varVCFfile ${OUTPUT}/${NAME}_input.vcf \
              --waspOutputMode SAMtag \
              --outSAMattributes vA vG
 done

PATH is right for sure . The key problem is when it get one file done, it stop. NO warning at all. When I type "ps" , is shows like this.

  PID TTY          TIME CMD
19335 pts/0    00:00:00 bash
19384 pts/0    00:00:00 bash
19665 pts/0    00:36:35 STAR
19668 pts/0    00:00:00 sh <defunct>
19708 pts/0    00:00:00 ps

Only when I type ''kill 19665 '' , the next file can be processed . I have no idea about this issue, this confuse me a lot . Could anyone tell me how to fix it? THANK YOU !

ADD COMMENTlink
0
Entering edit mode

See my suggestion for a simple parallelization script (for bowtie2 but I think you'll get the idea) A: perl script for BWA-mem on multiple different files

ADD REPLYlink
0
Entering edit mode

Thanks ! It seem useful , I will try in my code .

ADD REPLYlink
0
Entering edit mode

do ${OUT}/ and ${OUTPUT}/ exist before you run STAR?

How do you define $OUT and $OUTPUT?

ADD REPLYlink
0
Entering edit mode

It just like this

OUTPUT=/safedisk/CHIP_Seq/PhaseI/5_platypus_vcf
OUT=/safedisk/CHIP_Seq/PhaseI/6_wasp_bam

These two directory represent results of two different step ,${OUT} is where I store my STAR result. By the way ,I test STAR with one single file, "defunct"still happen.

ADD REPLYlink
0
Entering edit mode
11 months ago
caggtaagtat • 620

Hi,

I also execute STAR in a loop and use two differnt ways to get the file names. Either I submit the file names (with the respective paths) to STAR by a document which holds a filename per line:

# For every name in the file
while read SAMPLE; do

# Get single file name
FILEBASE=$(basename "${SAMPLE%.fq.rm_bl}")

# Make new directory for every sample
mkdir /path_to_later/gap_table/$FILEBASE.STAR

# Enter the new directory
cd /path_to_later/gap_table/$FILEBASE.STAR

# Align with STAR 
/path_to_STAR/STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd --runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --alignIntronMin 20 --genomeDir /path_to/star_index_hg38_hiv_r100/ --sjdbOverhang 100 --quantMode GeneCounts --sjdbGTFfile/path_to/hg38_pnL43_fusion_annotation.gtf --outFileNamePrefix /path_to/gap_table/$FILEBASE.STAR/ --readFilesIn $SAMPLE > STARaligning.log 

done </path_to_filename_file/filename

Another way would be to search within a directory for certain filenames, to use them subsequently in STAR as input:

Here the first row of the code above is replaced with this 2 lines:

# For every file in the given directory (/path_to_file/), use the filenames showing a ".fq" at the end
find /path_to_files/ -name "*.fq" | while read SAMPLE

# Get single file name
FILEBASE=$(basename "${SAMPLE%.fq}")

I suppose, the extra space between individual2 individual3 is not in the real code? Otherwise, I don't know the reason for the error during your particular kind of loop.

ADD COMMENTlink
0
Entering edit mode

Thanks a lot for answering ! There is no extra space between sample name in real code .I test STAR with single file , I type "ps" ,it look like this :

PID TTY          TIME CMD
29037 pts/1    00:00:00 bash
29088 pts/1    00:00:00 ps

Looking like normal, however, when I type "ps -ef | grep usr_name" .it shows :

28999 28935  0 18:33 pts/0    00:00:00 bash 1_STAR_test.sh
29007 28999 99 18:33 pts/0    00:23:18 STAR --runMode alignReads --runThreadN 10 --genomeDir /home/zhuyl/Genome/susScr11_STAR_update --readFilesIn /safedisk2/lingziqi/phaseI/2019-5-13-36individual/BMX4_Liver_input/BMX4_Liver_input_R1.fq.gz --readFilesCommand zcat --outSAMstrandField intronMotif --outFileNamePrefix /safedisk/09_Encode/CHIP_Seq/PhaseI/BWA_bam/2019-5-13-36individual_lingziqi/6_wasp_bam/BMX4_Liver_input_wasp --outSAMtype BAM Unsorted --varVCFfile /safedisk/09_Encode/CHIP_Seq/PhaseI/BWA_bam/2019-5-13-36individual_lingziqi/platypus_vcf/BMX4_Liver_input.vcf --waspOutputMode SAMtag --outSAMattributes vA vG
29010 29007  0 18:33 pts/0    00:00:00 [sh] <defunct>`

I guess maybe it is not about loop , it just STAR can't exit normally when it get job done ? Have you ever met this issue before ?

ADD REPLYlink
0
Entering edit mode

No sry never. Are you sure, you provided the 30GB RAM you need for aligning with STAR?

ADD REPLYlink
0
Entering edit mode

yes, total RAM is 60GB . Anyway, Thanks for helping me . ^o^

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1