Anybody Managed To Make Inparanoid Work?
1
0
Entering edit mode
10.1 years ago
Birdman ▴ 20

I'm trying to use InParanoid (v4.1) to detect orthologs in two de novo transcriptomes I assembled. They were 'translated' to protein sequences using Transdecoder. The resulting fasta file I'm trying to use in InParanoid looks like this (~200 000 seqs):

>comp100291_c0_seq1
LPKKILLPIQQVLGHLLLALSYRGKVMQVKALKSKHEHNGPETLDAFLSSKLVVVKQPRE
QAGFPLSIVFIPGEGRQERFLLHGEYNQSFCKEPVMELPRQ
>comp102162_c0_seq1
PNMTLHFLKSSPGSWRLSGLVLIPYVTETISGSCETLTRLQMPAHIQQSRWKAKHGPRIL
LLGLLQNLRSLFPLKVLPPGANSQLKRNCSFTSVCLIGTFYVESS
>comp102206_c0_seq1
CQEQKWQKGNREEKGWAGVTVWGAYFPYLLIRCPNHQTSTPLSIHSQQHFMLCIIICPFS
WLKPPVKTTQMFKGFFFKSGLKKFLALFLISWAAFATDRPLLGKQQSR

I tried the example fasta files supplied with the program (called SC and EC) and it works, but when I use my files, it's stuck at the first step and it does not create any file (nor disk usage) after days. Here is what I get with my fasta files:

Loading module bio/ncbi-blast-2.2.22.
Formatting BLAST databases
Done formatting
Starting BLAST searches...

Starting first BLAST pass for bf - bf on [blastall] WARNING: the -C 3 argument is currently experimental

It then stays like this forever.

I also tried supplying my Blast results (inter-sample) generated myself that I parsed with their supplied parser but then it still stays forever at the same state, again without generating any file:

Done BLAST searches. Starting ortholog detection...

I tried with and without bootstraping, multitreading (-a16 option) or not, as I said with or without supplied blast results and I also cleaned my fasta files for any weird characters (removed annotations, all ' * ', spaces, empty lines and dots. Now I'm running out of ideas... I'm using a Unix cluster. I tried these jobs using up to 16 CPUs with 256G memory.

Anybody managed to make that program work?

• 4.5k views
ADD COMMENT
0
Entering edit mode

EDIT: I was able to make it work with a small subset of my sequences (a few thousands). It seems that InParanoid have problems with large datasets (hundreds of thousands)... My question now becomes: Anybody managed to make that program work with large datasets?

ADD REPLY
0
Entering edit mode

I met the same problem. It also said "Blast output file A->B is missing". Have you fixed this problem?

ADD REPLY
0
Entering edit mode

I am facing the similar issues, any update?

ADD REPLY
0
Entering edit mode

I met the same problem. It also said "Blast output file A->B is missing". Have you fixed this problem?

ADD REPLY
0
Entering edit mode

I find orthoMCL to do the job better, that's why I gave up on inparanoid.

ADD REPLY
2
Entering edit mode
10.0 years ago

I think what you're running into is an issue with InParanoid running legacy BLAST instead of BLAST+. According to this NCBI page the legacy executables have a cap at ~65K sequences and run into other issues with large data. This is fixed in BLAST+ but InParanoid runs legacy BLAST by default. The workaround with this is to update the InParanoid source code. I am working on that now, and if I can get it all to work I will update this post with a link to a Github page.

ADD COMMENT
0
Entering edit mode

Did you succeed in making InParanoid work with BLAST+?

ADD REPLY
0
Entering edit mode

please please:)

ADD REPLY

Login before adding your answer.

Traffic: 2591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6