Question

How To Learn Pipeline Development?

3

Entering edit mode

10.1 years ago

biolab ★ 1.4k

Hi everyone, I am new in script programming. I have persisted on learning perl for a couple of months. Now I have been able to write up to ~100 lines, although NOT concise. To complete a task, I usually need a combination of perl scripts, many commands and other softwares. I want to further learn something about pipeline development, which may be very useful for my work. I googled bioinformatics pipline, but could not get much information. Could anyone offer some suggestions on starting pipeline, especially show some examples ? It's also very useful to give some information (websites or books) on pipeline. I will appreciate your advices very much!!

pipeline • 4.6k views

ADD COMMENT • link updated 10.1 years ago by Ashutosh Pandey 12k • written 10.1 years ago by biolab ★ 1.4k

Ram · Answer 1 · 2014-03-07

learn make:

here are 3 examples I gave to my students (http://www.slideshare.net/lindenb/make-16134373 ). They all do the same job.

enter image description here

TRANSCRIPT=cat  # a tool that would convert a DNA to RNA stdin
TRANSLATE=cat # a tool that would translate a DNA from stdin
merged.protein: file1.pep file2.pep file3.pep
    cat file1.pep file2.pep \
        file3.pep > merged.protein

file1.pep: file1.rna
     ${TRANSLATE} file1.rna > file1.pep

file1.rna : file1.dna
    ${TRANSCRIPT} file1.dna > file1.rna

file1.dna:
    echo "ATGCTAGTAGATGC" > file1.dna

file2.pep: file2.rna
     ${TRANSLATE} file2.rna > file2.pep

file2.rna : file2.dna
    ${TRANSCRIPT} file2.dna > file2.rna

file2.dna:
    echo "ATGCTAGTAGATGC" > file2.dna


file3.pep: file3.rna
     ${TRANSLATE} file3.rna > file3.pep

file3.rna : file3.dna
    ${TRANSCRIPT} file3.dna > file3.rna

file3.dna:
    echo "ATGCTAGTAGATGC" > file3.dna

... a second example

TRANSCRIPT=cat
TRANSLATE=cat

%.pep:%.rna
    ${TRANSLATE} $< > $@
%.rna:%.dna
    ${TRANSCRIPT} $< > $@

merged.protein: file1.pep file2.pep file3.pep
    cat $^ > $@

file1.dna:
    echo "ATGCTAGTAGATGC" > $@
file2.dna:
    echo "ATGCTAGTAGATGC" > $@
file3.dna:
    echo "ATGCTAGTAGATGC" > $@

and a 3rd example:

TRANSCRIPT=cat
TRANSLATE=cat
INDEXES=1 2 3
%.pep:%.rna
    ${TRANSLATE} $< > $@
%.rna:%.dna
    ${TRANSCRIPT} $< > $@

merged.protein: $(foreach INDEX,${INDEXES},file${INDEX}.pep )
    cat $^ > $@

$(foreach INDEX,${INDEXES},$(eval \
file${INDEX}:\
    echo "ATGCTAGTAGATGC" > $$@ \
))

Ram · Answer 2 · 2014-03-07

2

Entering edit mode

10.1 years ago

QVINTVS_FABIVS_MAXIMVS ★ 2.5k

It is possible to write a shell script that runs your Perl scripts as a pipeline. For example, you have NGS data that needs to be trimmed, filtered, and mapped.

#BEGIN BASH SCRIPT
perl trimmer.pl data.fastq
perl filter.pl    data_trim.fastq
perl map.pl   data_trim_filt.fastq

And then to run it you type this in your terminal

bash perl_pipeline.sh

I started using a cluster to analyze my data and I find it useful to write shell scripts with dependencies in order to run hundreds of scripts in parallel. It's a powerful feeling.

Hope you liked this simple example!

Edit:

If you're really lazy like me, a good facet of a computational scientist, you'll write a Perl script to write the bash script. You can load the names of the files from their directory in the script and then write out commands in a loop like

foreach(@files){
    print OUT "perl perl_script.pl $_\n";
}

Have fun!

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.1 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.5k

1

Entering edit mode

and that's why you need something like SGE+qmake to run your independent analysis in parallel. You can hardly parallelize things with a simple bash script.

ADD REPLY • link 10.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks for the info!

ADD REPLY • link 10.1 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.5k

score 1 · Answer 3 · 2014-03-07

1

Entering edit mode

10.1 years ago

Martin A Hansen 3.0k

Have a look at Biopieces -> www.biopieces.org

ADD COMMENT • link 10.1 years ago by Martin A Hansen 3.0k

score 1 · Answer 4 · 2014-03-07

1

Entering edit mode

10.1 years ago

Ashutosh Pandey 12k

This post may give you some idea about existing pipeline building framework:

C: Which bioinformatic friendly pipeline building framework?

ADD COMMENT • link 10.1 years ago by Ashutosh Pandey 12k