This site is a beta test.
Question: Help setting up lncRNA-screen from github
1
Entering edit mode
13 months ago
natanamorim.moraes • 10

Hello everyone! I'm new to bioinformatics, and I'm having a really hard time trying to make this work. What I'm trying to set up is this https://github.com/NYU-BFX/lncRNA-screen

So I'm working with Long non-coding RNAs, and this pipeline created by Applied Bioinformatics Laboratories (New York, NY), does exactly what I need. However, I'm finding quite hard to set it up, could anyone help me?

*It says it uses SGE which I only got it to work with docker, is SGE really necessary? I only have 1 machine.

*Needs to install and set to path r/3.3.0, python/2.7.3, java/1.8 and samtools/1.3

*It has a linked folder for my RNA-seq and Chip-seq but I don't know how that works.

*Also says I need https://github.com/NYU-BFX/RNA-Seq_Standard even if I have my own RNA-seq (which I do have).

*The documentation says sratoolkit is included, but, my lack of experience makes me not understand how that works. *Here's a requirement file https://github.com/NYU-BFX/lncRNA-screen/blob/master/inputs/system_requirement.txt

This is my first post here, so I may do something wrong or post this question in the wrong place.

ADD COMMENTlink 13 months ago natanamorim.moraes • 10 • updated 13 months ago jared.andrews07 ♦ 2.4k
Entering edit mode
0

Did you ever get this working? I am interested to compare notes regarding the format of the resulting BED file.

ADD REPLYlink 9 months ago
eric.kern13
• 150
3
Entering edit mode
13 months ago
jared.andrews07 ♦ 2.4k
St. Louis, MO

SGE is a scheduling/job submission system for computing clusters. You don't need it to run locally, though your machine better be a beast, as STAR uses a lot of RAM and is slow without several processors (as is every aligner). If you have only a few samples, you can probably get away with it, but if you have dozens, you're going to be waiting a while. I'd see if your organization has a computing cluster that you can get access to.

As for R, python, java, and samtools, they are all easy to install and add to your PATH. You can google how to do it depending on your system, and many distros can install them through package managers. Or you can look into Anaconda, which makes install all of those and automatically adding them to PATH very easy regardless of your OS. sratoolkit is also simple to install.

The links are symlinks, basically saying where your folders containing your ChIP-seq and RNA-seq data should be relative to that folder. In this case, it looks like you should have folders for them a level up from the installation directory. You can also just replace the link so that it points to wherever your data files are for each.

Installing something like this is a headache even for experienced bioinformaticists - relative lack of documentation, heavy reliance on relative paths, etc. I imagine it is one of those things that will result in a million errors with uninformative tracebacks that you'll spend days fixing before getting it to run in full. If you don't know how to install/add basic programs to your PATH, I would take a few days to learn how to do that and utilize the command line properly. Otherwise, you will likely continue to be frustrated.

ADD COMMENTlink 13 months ago jared.andrews07 ♦ 2.4k
Entering edit mode
0

Good that I don't need SGE. We have a local server with an old Xeon, Ubuntu server 16.04 and 32gb of ram (and that's all we can have for now).

About the r/3.3.0, python/2.7.3, java/1.8 and samtools/1.3 install and add to your PATH, I know how to do it, I forgot to mention in this part that they mentioned "Software Environment Management", and I don't know any, I don't really need it but it's something that could be helpful for me even in other projects.

Thanks a lot for your answer Jared.

ADD REPLYlink 12 months ago
natanamorim.moraes
• 10

Login before adding your answer.

Powered by the version 1.5.2