Question

Help setting up lncRNA-screen from github

1

Entering edit mode

5.6 years ago

natanamorim.moraes ▴ 10

Hello everyone!

I'm new to bioinformatics, and I'm having a really hard time trying to make this work. What I'm trying to set up is this https://github.com/NYU-BFX/lncRNA-screen

So I'm working with Long non-coding RNAs, and this pipeline created by Applied Bioinformatics Laboratories (New York, NY), does exactly what I need. However, I'm finding quite hard to set it up, could anyone help me?

It says it uses SGE which I only got it to work with docker, is SGE really necessary? I only have 1 machine.

Needs to install and set to path r/3.3.0, python/2.7.3, java/1.8 and samtools/1.3

It has a linked folder for my RNA-seq and Chip-seq but I don't know how that works.

Also says I need https://github.com/NYU-BFX/RNA-Seq_Standard even if I have my own RNA-seq (which I do have).

The documentation says sratoolkit is included, but, my lack of experience makes me not understand how that works. Here's a requirement file https://github.com/NYU-BFX/lncRNA-screen/blob/master/inputs/system_requirement.txt

This is my first post here, so I may do something wrong or post this question in the wrong place.

lncRNA GitHub RNA-Seq ChIP-Seq • 1.3k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 5.6 years ago by natanamorim.moraes ▴ 10

0

Entering edit mode

Did you ever get this working? I am interested to compare notes regarding the format of the resulting BED file.

ADD REPLY • link 5.3 years ago by eric.kern13 ▴ 240

score 3 · Accepted Answer · 2018-09-24

SGE is a scheduling/job submission system for computing clusters. You don't need it to run locally, though your machine better be a beast, as STAR uses a lot of RAM and is slow without several processors (as is every aligner). If you have only a few samples, you can probably get away with it, but if you have dozens, you're going to be waiting a while. I'd see if your organization has a computing cluster that you can get access to.

As for R, python, java, and samtools, they are all easy to install and add to your PATH. You can google how to do it depending on your system, and many distros can install them through package managers. Or you can look into Anaconda, which makes install all of those and automatically adding them to PATH very easy regardless of your OS. sratoolkit is also simple to install.

The links are symlinks, basically saying where your folders containing your ChIP-seq and RNA-seq data should be relative to that folder. In this case, it looks like you should have folders for them a level up from the installation directory. You can also just replace the link so that it points to wherever your data files are for each.

Installing something like this is a headache even for experienced bioinformaticists - relative lack of documentation, heavy reliance on relative paths, etc. I imagine it is one of those things that will result in a million errors with uninformative tracebacks that you'll spend days fixing before getting it to run in full. If you don't know how to install/add basic programs to your PATH, I would take a few days to learn how to do that and utilize the command line properly. Otherwise, you will likely continue to be frustrated.