Biostar Beta. Not for public use.
Question: Cufflinks output with filename prefix
0
Entering edit mode

Hi all,

I am using gnu parallel for cufflinks.I had question on how to give the prefix to the output files.Since I have a lot of bam samples and the output files produced are of the same name and would replace each time for the new sample.

        parallel -j $NSLOTS --xapply \

       "cufflinks -G gencode.v19.annotation.gtf {1} > {1/.}." ::: /input/*.bam

I am using this command above but it does not work.

Thanks,

Ron

ADD COMMENTlink 5.2 years ago Ron • 950 • updated 5.2 years ago ole.tange ♦ 3.4k
Entering edit mode
0

Could you include an example of the input and the command you expected it to run?

ADD REPLYlink 5.2 years ago
ole.tange
♦ 3.4k
Entering edit mode
0

I mean the input files are of different name but cufflinks generates files transcripts.gtf, genes.fpkm_tracking, and isoforms.fpkm_tracking.So every time a sample is run,these output files with same name(transcripts.gtf, genes.fpkm_tracking, and isoforms.fpkm_tracking)are generated again .I want these output files to be named according to the sample.

e.g 508_C.bam would be named as 508_C.genes.fpkm_tracking,508_C.transcripts.gtf.

ADD REPLYlink 5.2 years ago
Ron
• 950
Entering edit mode
0

Ahh. So you want GNU Parallel to help fix a problem in cufflinks. It cannot do that directly, but you can help it by making a function that does that for your. Untested:

doit() {
  mkdir $1
  cd $1
  cufflinks -G gencode.v19.annotation.gtf "$2" > ../"$3"
  mv transcripts.gtf ../"$3"-transcripts.gtf
  mv genes.fpkm_tracking ../"$3"-genes.fpkm_tracking
  mv isoforms.fpkm_tracking ../"$3"-isoforms.fpkm_tracking
  cd ..
  rm -r $1
}
export -f doit
parallel doit {#} {} {/} ::: /input/*.bam
ADD REPLYlink 5.2 years ago
ole.tange
♦ 3.4k
4
Entering edit mode

If you have a lot of input files that are named the same and you want the output files named uniquely, we somehow need a way make the names unique. The most obvious way would be to use {#} which is the job number:

parallel cufflinks -G gencode.v19.annotation.gtf {} '>' {/.}.{#} ::: /input/*.bam

Another idea is to append a random value (and hope for no clashes):

parallel cufflinks -G gencode.v19.annotation.gtf {} '>' {/.}.'{= $_=int(1000000*rand) =}' ::: /input/*.bam

If the full path is unique, you can simply change the / to _:

parallel cufflinks -G gencode.v19.annotation.gtf {} '>{=s:/:_:g=}' ::: /input/*.bam
ADD COMMENTlink 5.2 years ago ole.tange ♦ 3.4k
Entering edit mode
0

The file names are different but I am getting only 4 output files for all bam samples namely:

genes.fpkm_tracking

isoforms.fpkm_tracking

skipped.gtf
transcripts.gtf

I expect to have separate files for my each sample,so want to add a prefix somehow as I think I am getting the results of the last bam sample only.I tried the first command but the output is same.

ADD REPLYlink 5.2 years ago
Ron
• 950
Entering edit mode
0

can you explain this part {} '>' {/.}.{#} ::: /input/*.bam since there might be bam files which would be from different samples and i each cufflink output I would like to be in different folder .How can I do that ?with the command you have mentioned ?

ADD REPLYlink 3.0 years ago
krushnach80
• 500

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0