Cufflinks output with filename prefix
1
0
Entering edit mode
9.4 years ago
Ron ★ 1.2k

Hi all,

I am using gnu parallel for cufflinks.I had question on how to give the prefix to the output files.Since I have a lot of bam samples and the output files produced are of the same name and would replace each time for the new sample.

    parallel -j $NSLOTS --xapply \
   "cufflinks -G gencode.v19.annotation.gtf {1} > {1/.}." ::: /input/*.bam

I am using this command above but it does not work.

Thanks,
Ron

next-gen RNA-Seq cufflinks • 4.3k views
ADD COMMENT
0
Entering edit mode

Could you include an example of the input and the command you expected it to run?

ADD REPLY
0
Entering edit mode

I mean the input files are of different name but cufflinks generates files transcripts.gtf, genes.fpkm_tracking, and isoforms.fpkm_tracking.So every time a sample is run,these output files with same name (transcripts.gtf, genes.fpkm_tracking, and isoforms.fpkm_tracking) are generated again. I want these output files to be named according to the sample.

e.g 508_C.bam would be named as 508_C.genes.fpkm_tracking, 508_C.transcripts.gtf.

ADD REPLY
0
Entering edit mode

Ahh. So you want GNU Parallel to help fix a problem in cufflinks. It cannot do that directly, but you can help it by making a function that does that for your. Untested:

doit() {
  mkdir $1
  cd $1
  cufflinks -G gencode.v19.annotation.gtf "$2" > ../"$3"
  mv transcripts.gtf ../"$3"-transcripts.gtf
  mv genes.fpkm_tracking ../"$3"-genes.fpkm_tracking
  mv isoforms.fpkm_tracking ../"$3"-isoforms.fpkm_tracking
  cd ..
  rm -r $1
}
export -f doit
parallel doit {#} {} {/} ::: /input/*.bam
ADD REPLY
4
Entering edit mode
9.4 years ago
ole.tange ★ 4.4k

If you have a lot of input files that are named the same and you want the output files named uniquely, we somehow need a way make the names unique. The most obvious way would be to use {#} which is the job number:

parallel cufflinks -G gencode.v19.annotation.gtf {} '>' {/.}.{#} ::: /input/*.bam

Another idea is to append a random value (and hope for no clashes):

parallel cufflinks -G gencode.v19.annotation.gtf {} '>' {/.}.'{= $_=int(1000000*rand) =}' ::: /input/*.bam

If the full path is unique, you can simply change the / to _:

parallel cufflinks -G gencode.v19.annotation.gtf {} '>{=s:/:_:g=}' ::: /input/*.bam
ADD COMMENT
0
Entering edit mode

The file names are different but I am getting only 4 output files for all bam samples namely:

genes.fpkm_tracking
isoforms.fpkm_tracking
skipped.gtf
transcripts.gtf

I expect to have separate files for my each sample, so want to add a prefix somehow as I think I am getting the results of the last bam sample only. I tried the first command but the output is same.

ADD REPLY
0
Entering edit mode

Can you explain this part {} '>' {/.}.{#} ::: /input/*.bam since there might be bam files which would be from different samples and each cufflink output I would like to be in different folder. How can I do that ?with the command you have mentioned?

ADD REPLY

Login before adding your answer.

Traffic: 2644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6