Question

Anybody Managed To Make Inparanoid Work?

0

Entering edit mode

10.1 years ago

Birdman ▴ 20

I'm trying to use InParanoid (v4.1) to detect orthologs in two de novo transcriptomes I assembled. They were 'translated' to protein sequences using Transdecoder. The resulting fasta file I'm trying to use in InParanoid looks like this (~200 000 seqs):

>comp100291_c0_seq1
LPKKILLPIQQVLGHLLLALSYRGKVMQVKALKSKHEHNGPETLDAFLSSKLVVVKQPRE
QAGFPLSIVFIPGEGRQERFLLHGEYNQSFCKEPVMELPRQ
>comp102162_c0_seq1
PNMTLHFLKSSPGSWRLSGLVLIPYVTETISGSCETLTRLQMPAHIQQSRWKAKHGPRIL
LLGLLQNLRSLFPLKVLPPGANSQLKRNCSFTSVCLIGTFYVESS
>comp102206_c0_seq1
CQEQKWQKGNREEKGWAGVTVWGAYFPYLLIRCPNHQTSTPLSIHSQQHFMLCIIICPFS
WLKPPVKTTQMFKGFFFKSGLKKFLALFLISWAAFATDRPLLGKQQSR

I tried the example fasta files supplied with the program (called SC and EC) and it works, but when I use my files, it's stuck at the first step and it does not create any file (nor disk usage) after days. Here is what I get with my fasta files:

Loading module bio/ncbi-blast-2.2.22.
Formatting BLAST databases
Done formatting
Starting BLAST searches...

Starting first BLAST pass for bf - bf on [blastall] WARNING: the -C 3 argument is currently experimental

It then stays like this forever.

I also tried supplying my Blast results (inter-sample) generated myself that I parsed with their supplied parser but then it still stays forever at the same state, again without generating any file:

Done BLAST searches. Starting ortholog detection...

I tried with and without bootstraping, multitreading (-a16 option) or not, as I said with or without supplied blast results and I also cleaned my fasta files for any weird characters (removed annotations, all ' * ', spaces, empty lines and dots. Now I'm running out of ideas... I'm using a Unix cluster. I tried these jobs using up to 16 CPUs with 256G memory.

Anybody managed to make that program work?

• 4.5k views

ADD COMMENT • link updated 6.4 years ago by huangxiaoyun1 • 0 • written 10.1 years ago by Birdman ▴ 20

0

Entering edit mode

EDIT: I was able to make it work with a small subset of my sequences (a few thousands). It seems that InParanoid have problems with large datasets (hundreds of thousands)... My question now becomes: Anybody managed to make that program work with large datasets?

ADD REPLY • link 10.1 years ago by Birdman ▴ 20

0

Entering edit mode

I met the same problem. It also said "Blast output file A->B is missing". Have you fixed this problem?

ADD REPLY • link 9.4 years ago by 695624096 • 0

0

Entering edit mode

I am facing the similar issues, any update?

ADD REPLY • link 8.6 years ago by kudzu • 0

0

Entering edit mode

I met the same problem. It also said "Blast output file A->B is missing". Have you fixed this problem?

ADD REPLY • link 6.4 years ago by huangxiaoyun1 • 0

0

Entering edit mode

I find orthoMCL to do the job better, that's why I gave up on inparanoid.

ADD REPLY • link 6.4 years ago by Adrian Pelin ★ 2.6k

Ram · Answer 1 · 2014-04-16

2

Entering edit mode

10.0 years ago

kristenbekc527 ▴ 20

I think what you're running into is an issue with InParanoid running legacy BLAST instead of BLAST+. According to this NCBI page the legacy executables have a cap at ~65K sequences and run into other issues with large data. This is fixed in BLAST+ but InParanoid runs legacy BLAST by default. The workaround with this is to update the InParanoid source code. I am working on that now, and if I can get it all to work I will update this post with a link to a Github page.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by kristenbekc527 ▴ 20

0

Entering edit mode

Did you succeed in making InParanoid work with BLAST+?

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.4 years ago by cvlas076 • 0

0

Entering edit mode

please please:)

ADD REPLY • link 9.4 years ago by Adrian Pelin ★ 2.6k