How To Learn Pipeline Development?
4
3
Entering edit mode
10.1 years ago
biolab ★ 1.4k

Hi everyone, I am new in script programming. I have persisted on learning perl for a couple of months. Now I have been able to write up to ~100 lines, although NOT concise. To complete a task, I usually need a combination of perl scripts, many commands and other softwares. I want to further learn something about pipeline development, which may be very useful for my work. I googled bioinformatics pipline, but could not get much information. Could anyone offer some suggestions on starting pipeline, especially show some examples ? It's also very useful to give some information (websites or books) on pipeline. I will appreciate your advices very much!!

pipeline • 4.6k views
ADD COMMENT
5
Entering edit mode
10.1 years ago

learn make:

here are 3 examples I gave to my students (http://www.slideshare.net/lindenb/make-16134373 ). They all do the same job.

enter image description here

TRANSCRIPT=cat  # a tool that would convert a DNA to RNA stdin
TRANSLATE=cat # a tool that would translate a DNA from stdin
merged.protein: file1.pep file2.pep file3.pep
    cat file1.pep file2.pep \
        file3.pep > merged.protein

file1.pep: file1.rna
     ${TRANSLATE} file1.rna > file1.pep

file1.rna : file1.dna
    ${TRANSCRIPT} file1.dna > file1.rna

file1.dna:
    echo "ATGCTAGTAGATGC" > file1.dna

file2.pep: file2.rna
     ${TRANSLATE} file2.rna > file2.pep

file2.rna : file2.dna
    ${TRANSCRIPT} file2.dna > file2.rna

file2.dna:
    echo "ATGCTAGTAGATGC" > file2.dna


file3.pep: file3.rna
     ${TRANSLATE} file3.rna > file3.pep

file3.rna : file3.dna
    ${TRANSCRIPT} file3.dna > file3.rna

file3.dna:
    echo "ATGCTAGTAGATGC" > file3.dna

... a second example

TRANSCRIPT=cat
TRANSLATE=cat

%.pep:%.rna
    ${TRANSLATE} $< > $@
%.rna:%.dna
    ${TRANSCRIPT} $< > $@

merged.protein: file1.pep file2.pep file3.pep
    cat $^ > $@

file1.dna:
    echo "ATGCTAGTAGATGC" > $@
file2.dna:
    echo "ATGCTAGTAGATGC" > $@
file3.dna:
    echo "ATGCTAGTAGATGC" > $@

and a 3rd example:

TRANSCRIPT=cat
TRANSLATE=cat
INDEXES=1 2 3
%.pep:%.rna
    ${TRANSLATE} $< > $@
%.rna:%.dna
    ${TRANSCRIPT} $< > $@

merged.protein: $(foreach INDEX,${INDEXES},file${INDEX}.pep )
    cat $^ > $@

$(foreach INDEX,${INDEXES},$(eval \
file${INDEX}:\
    echo "ATGCTAGTAGATGC" > $$@ \
))
ADD COMMENT
0
Entering edit mode

Thank you very much! It's really helpful!

ADD REPLY
2
Entering edit mode
10.1 years ago

It is possible to write a shell script that runs your Perl scripts as a pipeline. For example, you have NGS data that needs to be trimmed, filtered, and mapped.

#BEGIN BASH SCRIPT
perl trimmer.pl data.fastq
perl filter.pl    data_trim.fastq
perl map.pl   data_trim_filt.fastq

And then to run it you type this in your terminal

bash perl_pipeline.sh

I started using a cluster to analyze my data and I find it useful to write shell scripts with dependencies in order to run hundreds of scripts in parallel. It's a powerful feeling.

Hope you liked this simple example!

Edit:

If you're really lazy like me, a good facet of a computational scientist, you'll write a Perl script to write the bash script. You can load the names of the files from their directory in the script and then write out commands in a loop like

foreach(@files){
    print OUT "perl perl_script.pl $_\n";
}

Have fun!

ADD COMMENT
1
Entering edit mode

and that's why you need something like SGE+qmake to run your independent analysis in parallel. You can hardly parallelize things with a simple bash script.

ADD REPLY
0
Entering edit mode

Thanks for the info!

ADD REPLY
1
Entering edit mode
10.1 years ago

Have a look at Biopieces -> www.biopieces.org

ADD COMMENT
1
Entering edit mode
10.1 years ago

This post may give you some idea about existing pipeline building framework:

C: Which bioinformatic friendly pipeline building framework?

ADD COMMENT

Login before adding your answer.

Traffic: 1596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6