Biostar Beta. Not for public use.
SEED assignment (MEGAN5)
0
Entering edit mode
6.6 years ago
fhsantanna • 440
@fhsantanna11754

For me it is not clear how MEGAN assign the SEED classes to the sequences. In the manual, it says that it is made throughtout the identification of RefSeq id (accession number?).

When I load my blast file using the builtin refseq map, none of the sequences gets an assignment... The taxonomic identification works...

The basic format of my blast file is like that:

01contig00001_567_4372_+ gi|495059216|ref|WP_007784050.1| 56.09 624 262 7 348 967 5 620 0.0 711 Brevibacillus sp. CF112

Any light on this problem?

Thanks in advance!

MEGAN5 metagenomics SEED BLAST RefSeq • 3.6k views
ADD COMMENTlink
0
Entering edit mode

Hello fhsantanna,

how did you solve the problem?

I did use your mapping file. However, I still end up with not assigned SEED hits. Which files do you import? What are your LCA parameters?

The weird thing is that I got it to work yesterday but I can't reconstruct how I got it to work.

ADD REPLYlink
0
Entering edit mode

I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files:

Fields (separated by tab): query_id, subject_id, %_identity, alignment_length, mismatch, gap, query_start, query_end, subject_start, subject_end, evalue, bit score, subject_organism

01_contig00001_1_491_+ ref|ZP_10741618.1 29.94 157 0 0 9 157 4 158 1e-11 67.4 Brevibacillus sp. CF112

ADD REPLYlink
0
Entering edit mode

My file is in the exact same format.

This might sound stupid but could you also give me the other LCA parameters like Max Expected Top Percent ..

And what do you select on import -> I selected GI to Taxon mapping and Refseq to Seed mapping using your uploaded file.

ADD REPLYlink
0
Entering edit mode

Ok, now I am a bit more confused. In your first post you said your subject_id is in this format:

gi|503216142|ref|WP_013450803.1|

But now your format only has the RefseqID, eg:

ref|ZP_10741618.1

Was this the problem? My Output has the frist format you posted (GI-ID|Ref-ID)

ADD REPLYlink
0
Entering edit mode

Both work. The first one I utilized the taxon mapping using the gi numbers. In the second one it is not necessary, since there is an additional column with the taxon name.

LCA parameters: min score 50 max expected 0.01 top percent 10 min support percent 0.1 min support 1 lca percent 100 other options turned off

ADD REPLYlink
0
Entering edit mode

Today I have downloaded the last version of MEGAN. For Seed analysis I just have checked "use builtin refseq map".

ADD REPLYlink
0
Entering edit mode

I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files:

Fields (separated by tab): query_id, subject_id, %_identity, alignment_length, mismatch, gap, query_start, query_end, subject_start, subject_end, evalue, bit score, subject_organism

01_contig00001_1_491_+ ref|ZP_10741618.1 29.94 157 0 0 9 157 4 158 1e-11 67.4 Brevibacillus sp. CF112

ADD REPLYlink
0
Entering edit mode
6.7 years ago
Josh Herr 5.6k
@Josh Herr1704

You didn't give us much to work on here (what are your errors, etc). It looks like you have had this problem for a while based on the past questions (here and here) that you have had.

Basically, your BLAST output is not "talking" to the parser that does the SEED classification. This can be due to an incorrectly formatted BLAST output, not querying a RefSeq ID, etc.

I would start at the beginning and proceed until you have an error and then dissect which step is failing you -- sounds like it is somewhere after the BLAST analysis. Have you updated the RefSeq and SEED databases or checked that they downloaded correctly in MEGAN? You didn't provide any information in your samples -- would these be easily annotated?

Have you read over the MEGAN manual?

These other people have had issues with connecting to the SEED databases -- do these posts (here and here) help?

You can also use the SEED API to query outside of MEGAN.

ADD COMMENTlink
0
Entering edit mode

Hi.

Firstly, thank you for the feedback. Sorry about the questions, but I am a newbie in this field, maybe I am doing a very basic mistake...

Yes, I have read the MEGAN manual. But unfortunately it is not so clear to me. According to it and the website, MEGAN is provided with built-in SEED and KEGG mapping files. However I cannot find them in any of the directories of the program, and none of them are available in the MEGAN website.

I also already have read the recommended posts, and as you can see, people are having the same problems as I am, and no solution is available.

Since my metagenome data have to be converted, I am testing the program with a testing sample (a tabular blast file with six sequences). The manual mentions that there are example files available for testing, but it is not true.

As I mentioned previously, the taxonomic classification works, meaning that MEGAN is identifying the gi number. But in respect to the SEED classification, no error occurs, the sequences are just "classified" as "not assigned". This is also true for KEGG classification.

My test file is this:

gi|313672642|ref|YP_004050753.1| gi|503216142|ref|WP_013450803.1| 100.00 436 0 0 1 436 1 436 0.0 882
gi|312939398|gb|ADR18590.1| gi|503216142|ref|WP_013450803.1| 100.00 436 0 0 1 436 1 436 0.0 882
gi|297146112|gb|ADI02869.1| gi|502941295|ref|WP_013176271.1| 100.00 493 0 0 1 493 1 493 0.0 1002
gi|307157605|gb|ADN36985.1| gi|503095346|ref|WP_013330162.1| 100.00 766 0 0 1 766 1 766 0.0 1574
gi|428678825|gb|AFZ57591.1| gi|505027134|ref|WP_015214236.1| 100.00 309 0 0 1 309 1 309 0.0 640
gi|335359244|gb|AEH44925.1| gi|503673591|ref|WP_013907667.1| 100.00 489 0 0 1 489 1 489 0.0 1008

In the LCA params, I have set Min support to 1.

The option "Analyse SEED content", "Use Built-in RefSeq Map", and "Use RefSeq Map" are on.

Any ideas?

ADD REPLYlink
1
Entering edit mode

Without going to the MEGAN manual to check, it sounds like you do not have a reference database (or connection to one) in your installation. You need to figure out why this is the case so you can parse your BLAST table to KEGG, SEED, etc. You may have to actually download the SEED files and place them in the correct MEGAN folder, but perhaps you can download these through MEGAN.

I haven't used MEGAN in a few years -- with Illumina data, BLAST based metagenomic classification methods are ridiculously slow. As a result, I don't really know too many people who have used the program lately. Eventhough I don't use it much anymore, I will say it is, along with Dan Huson's other programs, well put together.

ADD REPLYlink
0
Entering edit mode

Speaking with Daniel Huson, I understood the problem. The builtin SEED mapping file was not up to date, therefore there was no correspondence between it and my file.

Anyway, thank you for the feedback.

ADD REPLYlink
0
Entering edit mode

Hi, I have the same problem. How did ou update the SEED mapping file?

ADD REPLYlink
0
Entering edit mode

That's what I thought -- glad you got it cleared up.

ADD REPLYlink
0
Entering edit mode
6.6 years ago
fhsantanna • 440
@fhsantanna11754

Daniel Huson sent me the latest version of Seed map. Here is the link for its download: http://wikisend.com/download/508312/ref2seed.map.zip

ADD COMMENTlink
0
Entering edit mode

can you give me the new mapping file? This link can not be used. Thanks in advance.

ADD REPLYlink
0
Entering edit mode

Since then MEGAN was updated several times. I believe it would be better practice to download its last version. The "built-in" files are contained in data.jar

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3