Entering edit mode
7.3 years ago
tans0307
•
0
Hello people,
I have tried downloading the stand-alone version of Prodege but I am having some issues.
Untarring database files
nt_euks.00.nhr
nt_euks.00.nin
nt_euks.00.nsq
nt_euks.01.nhr
nt_euks.01.nin
nt_euks.01.nsq
nt_euks.02.nhr
nt_euks.02.nin
nt_euks.02.nsq
nt_euks.03.nhr
nt_euks.03.nin
nt_euks.03.nsq
nt_euks.04.nhr
nt_euks.04.nin
nt_euks.04.nsq
nt_euks.05.nhr
nt_euks.05.nin
nt_euks.05.nsq
nt_euks.06.nhr
nt_euks.06.nin
nt_euks.06.nsq
nt_euks.07.nhr
nt_euks.07.nin
nt_euks.07.nsq
nt_euks.08.nhr
nt_euks.08.nin
nt_euks.08.nsq
nt_euks.09.nhr
nt_euks.09.nin
nt_euks.09.nsq
nt_euks.10.nhr
nt_euks.10.nin
nt_euks.10.nsq
nt_euks.11.nhr
nt_euks.11.nin
nt_euks.11.nsq
nt_euks.12.nhr
nt_euks.12.nin
nt_euks.12.nsq
nt_euks.13.nhr
nt_euks.13.nin
nt_euks.13.nsq
nt_euks.nal
imgdb.00.nhr
imgdb.00.nin
imgdb.00.nsq
imgdb.01.nhr
imgdb.01.nin
imgdb.01.nsq
imgdb.02.nhr
imgdb.02.nin
imgdb.02.nsq
imgdb.03.nhr
imgdb.03.nin
imgdb.03.nsq
imgdb.04.nhr
imgdb.04.nin
imgdb.04.nsq
imgdb.05.nhr
imgdb.05.nin
imgdb.05.nsq
imgdb.06.nhr
imgdb.06.nin
imgdb.06.nsq
imgdb.07.nhr
imgdb.07.nin
imgdb.07.nsq
imgdb.08.nhr
imgdb.08.nin
imgdb.08.nsq
imgdb.09.nhr
imgdb.09.nin
imgdb.09.nsq
imgdb.10.nhr
imgdb.10.nin
imgdb.10.nsq
imgdb.11.nhr
imgdb.11.nin
imgdb.11.nsq
imgdb.12.nhr
imgdb.12.nin
imgdb.12.nsq
imgdb.13.nhr
imgdb.13.nin
imgdb.13.nsq
imgdb.14.nhr
imgdb.14.nin
imgdb.14.nsq
imgdb.15.nhr
imgdb.15.nin
imgdb.15.nsq
imgdb.16.nhr
imgdb.16.nin
imgdb.16.nsq
imgdb.17.nhr
imgdb.17.nin
imgdb.17.nsq
imgdb.18.nhr
imgdb.18.nin
imgdb.18.nsq
imgdb.19.nhr
imgdb.19.nin
imgdb.19.nsq
imgdb.20.nhr
imgdb.20.nin
imgdb.20.nsq
imgdb.21.nhr
imgdb.21.nin
imgdb.21.nsq
imgdb.22.nhr
imgdb.22.nin
imgdb.22.nsq
imgdb.23.nhr
imgdb.23.nin
imgdb.23.nsq
imgdb.24.nhr
imgdb.24.nin
imgdb.24.nsq
imgdb.25.nhr
imgdb.25.nin
imgdb.25.nsq
imgdb.26.nhr
imgdb.26.nin
imgdb.26.nsq
imgdb.27.nhr
imgdb.27.nin
imgdb.27.nsq
imgdb.28.nhr
imgdb.28.nin
imgdb.28.nsq
imgdb.29.nhr
imgdb.29.nin
imgdb.29.nsq
imgdb.30.nhr
imgdb.30.nin
imgdb.30.nsq
imgdb.31.nhr
imgdb.31.nin
imgdb.31.nsq
imgdb.nal
Formatting blast database
Building a new DB, current time: 01/13/2017 01:40:54
New DB name: nt_euks
New DB title: nt_euks.fna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: File nt_euks.fna does not exist
rm: cannot remove 'nt_euks.fna': No such file or directory
Building a new DB, current time: 01/13/2017 01:40:54
New DB name: imgdb
New DB title: imgdb.fna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: File imgdb.fna does not exist
rm: cannot remove 'imgdb.fna': No such file or directory
prodege_install.sh: 100: prodege_install.sh: [[: not found
prodege_install.sh: 100: prodege_install.sh: -e: not found
R packages not installed. ProDeGe installation unsuccessful.
I have checked that Blast+ and R have both been installed and are added to my ~/.bashrc.
Will appreciate any advice I could get on this.
Thank you!
Looking at the files you appear to have downloaded pre-created blast index files. You do not need to create the blast indexes again (in case you are trying to re-run that step).
Does
which blastn
orwhich R
return the correct locations for these programs?@genomax2, thanks for your reply.
Just to clarify things:
1.) I do not have to run the sh prodege_install.sh anymore?
2.) How do I define the correct locations?
Which blastn: /home/tanshiming/tools/ncbi-blast-2.2.28+/bin/blastn
Which R: /usr/bin/R
Many thanks for your patience in this. :)
blast
andR
indeed appear to be available in your$PATH
. So that part is fine.Error is about an
R package not installed
? Do you know which R package or isProDeGe
an R-package (sorry I am not familiar with this program).So you are only running the install script (and not downloading these blast indexes manually) which is generating that error?
@genomax2, these are the requirements for the installation of Prodege:
I was running the install script, which led to the downloading of the databases and an error occurred at the end!
Rather than telling you how to run a standalone version of ProDeGe, which I can't do, can I ask you to explain exactly what it is you're trying to do? We do use ProDeGe in one of our pipelines, but it is no longer supported. However, in some cases, there may be alternatives.
@Brian, Thanks for your reply.
I have generated contigs from an MDA-ed sample that was enriched via cell sorting. However, in my negative controls (no template), I have noticed the presence of artefact sequences. As such, I will like to remove these artefact sequences through binning. Prodege tool seems to fit the description of what I am trying to achieve.
I am open to other suggestions that you might have. :)
Thank you.
Assume that any two things that are ever in the same room (not necessarily at the same time) will contaminate each other, and anything within 1m of a sample will contaminate it; it's really just a matter of degree (I would guess, the degree is a quadratic function of distance and linear to time). Also, assume all of your reagents are contaminated (they are). ProDeGe is specifically for removing large assembled contigs that appear to be a different taxonomy than the organism of interest, which is just a small subset of contamination outcomes. But your artifacts have a huge number of possible sources, and the best approach to removing them depends on the source and degree. So the better idea you have about the possible sources of contamination, the easier decontamination is. If you can BLAST your artifact reads and find out exactly what it is, decontamination becomes trivial and you should do it manually rather than using ProDeGe, unless you need to automate the process.
Dear @Brian, sorry for the tardy response.
1.) The sequencing was performed using an Illumina Hi Seq 2500
2.) The organism is a bacteria that is unclassified at the genus level
3.) The library was multiplexed with others
4.) I am not sure about this because the sequencing was performed by someone else, but I can find out.
5.) I did a blast and all the contigs are synthetic sequences (I guess I need a software that could do this decontamination).
Other information:
I did a FISH-FACS sorting of 1000 cells from an environmental sample. 16S rRNA of the Hi-Seq reads show a purity of >99%. However, due to MDA, these artefact sequences are generated when I performed a de novo assembly and I will like to remove them. However, due to the novelty of the target cells, I am not sure if taxonomy-homology tools are the best way to go. I will appreciate any advice I could get from here.
Unfortunately, neither ProDeGe nor any other decontamination tool will help you in this case. It sounds like a library failure. ProDeGe will only separate contigs, so if you have no contigs of your organism, it won't give you any output. What are the synthetic things matching the contigs?
Since you have the synthetic sequences, though, you could use BBDuk to remove all the corresponding reads and try to assemble what's left, if anything. For example:
Not sure what you mean by this. Normally you evaluate rRNA in single-cell MDA libraries using Sanger. Can you elaborate?
Generally, if you are multiplexing MDA-amplified single-cells, you will get crosstalk due to barcode miscalls, barcode contamination/impurity, chimerism, and so forth, that will assemble into contaminant contigs if the crosstalk level is sufficient (which we find that it is, using standard Illumina library-prep approaches). You can remove this low-level cross-contamination with BBMap's crossblock tool (run crossblock.sh for usage information); it is designed exactly for this situation. It is not, however, a universal decontamination utility and only deals with cross-contamination from another pooled library (all pooled libraries must be processed together).
Dear @Brian Bushnell,
Do you have a literature that show that cross-library contamination is common with Illumina library-prep approaches?
No, I am not aware of any published studies of the issue, though JGI might publish our data at some point. Note that it is not exactly a library-prep issue, though - cross-contamination occurs at many points, including during and after library-prep. But, for example, we have in the past had cross-contamination occurring on the robots used for preparing plates due to improper fluid levels, and that was some of the worst contamination.
Dear @Brian,
after analyzing my contigs using the ACDC software, I found out that the contigs of a sample that I had multiplexed with was present into my sample of interest. Do you have any suggestions on how I could go about tracing this source of contamination?
Thanks!
Tracing the origin is really difficult. Some of the things you can investigate are:
Sometimes, those can give you an idea of where the contamination may have occurred.
Hello @Brian,
I will like to clarify a few things. The synthetic artefacts were produced from an MDA amplification performed on sterile PBS (no genomic templates). Using the RiboTagger software (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1378-x), no RiboTags were observed in the sample. This probably has a high indication that the PBS was indeed sterile.
When I did a de novo assembly of the reads that were generated from this negative control, contigs up to 6 kbp were produced. A blast search showed that they could not be annotated. I suspect the artefact sequences are a by-product of MDA amplification (http://www.nature.com/nprot/journal/v1/n4/full/nprot.2006.326.html)
Therefore, I predict that these artefact sequences would be produced in an actually sample that contains cells. So the goal here is really to remove these artefacts. But the tricky part is the cells belong to a novel genus.
How often does this happen?
Please use
ADD REPLY/ADD COMMENT
to respond to existing posts to keep threads logically organized.Will take note of that, @genomax2