How to write effective and stable bioinformatics pipeline in R ?
4
1
Entering edit mode
9.6 years ago
jack ▴ 960

Hi all,

I'm planning to write a bioinformatics pipeline in R. Basicallly it will do gene expression quantification(which is in C++) and DE gene analysis (R package) and at the end, gene ontology and GSEA.

I'm looking for good tips and recommendation to take into account when I'm developing my pipeline in R.

what should I aviod in R ? what should I mostly care about it?

I'm keen to get some recommendations from you.

RNA-Seq pipeline software-error R • 6.0k views
ADD COMMENT
3
Entering edit mode

I appreciate that you are trying to get some general advice before setting out on a task, but this is a very general question. You will probably get more help if you can provide some specifics about what you plan to do (what task are you automating, how do you plan to achieve each step).

ADD REPLY
11
Entering edit mode
9.6 years ago

I'm planning to write a bioinformatics pipeline in R. (...)

what should I avoid in R ?

don't reinvent the wheel: make or other tools like snakemake are the workflow managers you need: How To Organize A Pipeline Of Small Scripts Together?

ADD COMMENT
3
Entering edit mode

Yes, much as I love R, if the OP means pipeline in the normal sense of an automated chain of scripts and calls to executables then "what to avoid in R" is probably "all of it".

ADD REPLY
0
Entering edit mode

Can you put link for "what to avoid in R" ? I couldn't find it by googleing

ADD REPLY
3
Entering edit mode

?

The point all of us are trying to make is that R is typically a bad choice for a pipeline, at least if you're using that term in the same way we do.

ADD REPLY
11
Entering edit mode
9.6 years ago

It usually makes more sense to incorporate an R script into a pipeline rather than writing the pipeline itself in R.

ADD COMMENT
0
Entering edit mode
9.6 years ago
rtliu ★ 2.2k

The best R pipeline example I have seen is the Nature Protocols paper -

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

If you don't have access to the journal, here is pre-publication version.

R and Bioconductor related lilbaries are captured in command

> sessionInfo()
The versions of system software packages are captured:
> system("bowtie2 --version | grep align", intern=TRUE)

[1] "/usr/local/software/bowtie2-2.1.0/bowtie2-align version 2.1.0"

It is a deep learning curve to learn R well enough to write the whole bioinformatics pipepine in R. Good luck :)

ADD COMMENT
0
Entering edit mode
8.5 years ago
sahiilseth ▴ 30

I do a lot of scripting in R, and with ggplot2 and bioconductor; there is so much one can achieve. Pipelining was surely a issue, so we built a tool to do just that (http://docs.flowr.space). One starts with a bunch of system commands, wraps them into a tab-delim text file. When done flowr can submit to a local server (parallel using mclapply), and clusters like LSF, Torque, SLURM and MOAB etc...

Usage:
flowr function [arguments]
   status Detailed status of a flow(s).
   rerun rerun a previously failed flow kill
   Kill the flow, upon providing working directory
   fetch_pipes Checking what modules and pipelines are available;

Please use 'flowr -h function' to obtain further information about the usage of a specific function.

Certainly biased (being a developer) but one may find it much easier to create a tsv file, than learning new syntax.

Second issue I faced was, say I have a R function which does a lot of things and now I wanted to call it from the terminal. R does not have a nice standard argument parse like python/perl. Now we have a package funr, where the first argument is the function you want to call, and rest are its arguments. One can call any R function of any installed package (or sourced script).

funr rnorm n=10
    -1.244571 1.378112 0.02189023 -0.3723951 0.282709 -0.22854 -0.8476185 0.3222024 0.08937781 -0.4985827

Hope you find it useful. Would be curious if it works out.

ADD COMMENT

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6