featureCounts and bash
3
0
Entering edit mode
5.7 years ago
cilgaiscan ▴ 60

Hi everyone! I have a problem with my bash code, which I added down here. I would like to automatize the my RNA-seq workflow and I tried to write this bash code. (I am pretty new in bash and programming.) I expect several steps from my code. These are;

  • changing the directory into source directory and give me a notification. (It works fine)
  • creating a file array with finding all files that is named as accepted_hits.bam in source directory and its sub-folder.(Problem occurs here. My code scans all computer and i have a memory error.)
  • and run featureCounts.

code starts

#!/bin/bash

SOURCE_DIR=$1
TARGET_DIR=$2

echo 'Going to $SOURCE_DIR'
cd $SOURCE_DIR

FILE_ARRAY=/$(locate accepted_hits.bam)

for file in $FILE_ARRAY; do

    serial_number=${file}
    echo "Working on $serial_number"

    mkdir -p $TARGET_DIR/$serial_number

      featureCounts -t $SOURCE_DIR/$file -a /home/cilga/Desktop/VPC/ref/hg19/hg19.gtf -o $TARGET_DIR$serial_number        done

I also tried these arrays, too:

find . -name $SOURCE_DIR/accepted_hits.bam

find . -path \*/$SOURCE_DIR/accepted_hits.bam

find ./ | grep '$SOURCE_DIR/accepted_hits.bam$'

find / -name 'accepted_hits.bam'

If someone helps me, I would be appreciated. Thank you :)

RNA-Seq featureCounts bash • 3.8k views
ADD COMMENT
1
Entering edit mode

Hello everyone, I was able to solve problem with small touches. I am sharing my working code :) Thanks for everyone for your helps.

   #!/bin/bash

SOURCE_DIR=$1
TARGET_DIR=$2

cd $SOURCE_DIR

find . -name 'accepted_hits.bam'
inputfiles=$(find . -name 'accepted_hits.bam' | xargs echo)
readlink -m $inputfiles

featureCounts -a hg19.gtf -o rnaseq.txt $inputfiles
ADD REPLY
1
Entering edit mode
find . -name 'accepted_hits.bam'

Just shows the files, literally does nothing else

inputfiles=$(find . -name 'accepted_hits.bam' | xargs echo)

The | xargs echo part does nothing here

readlink -m $inputfiles

Shows the path and literally does nothing else

Also, you should always quote your Bash variables..

ADD REPLY
0
Entering edit mode

hi. thank you, for your explanations. I am pretty new in bash and also programming too. I tried to combine things together :( thank you again!

ADD REPLY
1
Entering edit mode
5.7 years ago
find $SOURCE_DIR -name accepted_hits.bam
ADD COMMENT
0
Entering edit mode

it didn’t worked but i found another solution. thank you :)

ADD REPLY
1
Entering edit mode

People with similar problems may stumble upon your thread, so perhaps you could share your solution here?

ADD REPLY
1
Entering edit mode
5.7 years ago
jkim ▴ 170

There's some tools for building pipelines. snakemake, gnu-parallel...etc. Here's some code for star alignment using gnu-parallel. It works well for me.

STAR_INDEX="/home/mRNAseq/ref/gencode/human/GRCh37.19/idx/"  
INPUT_DIR="00_raw_fastq"
OUTPUT_DIR="01_alignment_genes_count"
GTF="/home/mRNAseq/ref/gencode/human/GRCh37.19/gtf/gencode.v19.annotation.gtf"
RSEM_INDEX="/home/mRNAseq/ref/gencode/human/GRCh37.19/idx/GRCh37.p13.genome"

mkdir -p $OUTPUT_DIR

cat samples.list | parallel --jobs 1 "mkdir -p $OUTPUT_DIR/{}"

cat samples.list | parallel --jobs 1  "STAR --genomeDir $STAR_INDEX --twopassMode Basic --quantMode GeneCounts --readFilesCommand zcat --readFilesIn $INPUT_DIR/{}_1.fastq.gz $INPUT_DIR/{}_2.fastq.gz --outFileNamePrefix $OUTPUT_DIR/{}/{}. --runThreadN 20 --outSAMtype BAM SortedByCoordinate --sjdbGTFfile $GTF --outSAMunmapped Within  --outSAMattributes Standard --runMode alignReads"

It's from my gist. The samples.list file is like ls *R1.fastq.gz | sed 's/R1.fastq.gz//' > samples.list something like this.

ADD COMMENT
1
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

Note that you can also post a link to a GitHub gist here directly and it will be rendered.

ADD REPLY
1
Entering edit mode

If you have a gist then I suggest that you replace the text above with a link to your gist. Biostars will automatically get the code from gist and show it in-line.

On a different note: While this is an alternate it is not a solution to the question that OP was asking originally.

ADD REPLY
0
Entering edit mode
5.7 years ago
5heikki 11k

Maybe something like this..

#!/bin/bash
DEPTH="1"
for FILE in $(find "$1" -maxdepth 10 -name "accepted_hits.bam"); do
    echo "Working on $FILE"
    mkdir -p "$2/$(basename "$FILE").$DEPTH"
    featureCounts -t "$FILE" \
      -a /home/cilga/Desktop/VPC/ref/hg19/hg19.gtf \
      -o "$2/$(basename "$FILE").$DEPTH"
    ((DEPTH++))
done
ADD COMMENT

Login before adding your answer.

Traffic: 2473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6