SEED assignment (MEGAN5)
6.6 years ago
@fhsantanna11754
For me it is not clear how MEGAN assign the SEED classes to the sequences. In the manual, it says that it is made throughtout the identification of RefSeq id (accession number?).
When I load my blast file using the builtin refseq map, none of the sequences gets an assignment... The taxonomic identification works...
The basic format of my blast file is like that:
01contig00001_567_4372_+ gi|495059216|ref|WP_007784050.1| 56.09 624 262 7 348 967 5 620 0.0 711 Brevibacillus sp. CF112
Any light on this problem?
Thanks in advance!
MEGAN5
metagenomics
SEED
BLAST
RefSeq
• 3.6k views
6.7 years ago
@Josh Herr1704
You didn't give us much to work on here (what are your errors, etc). It looks like you have had this problem for a while based on the past questions (here and here) that you have had.
Basically, your BLAST output is not "talking" to the parser that does the SEED classification. This can be due to an incorrectly formatted BLAST output, not querying a RefSeq ID, etc.
I would start at the beginning and proceed until you have an error and then dissect which step is failing you -- sounds like it is somewhere after the BLAST analysis. Have you updated the RefSeq and SEED databases or checked that they downloaded correctly in MEGAN? You didn't provide any information in your samples -- would these be easily annotated?
Have you read over the MEGAN manual?
These other people have had issues with connecting to the SEED databases -- do these posts (here and here) help?
You can also use the SEED API to query outside of MEGAN.
Login before adding your answer.
Hello fhsantanna,
how did you solve the problem?
I did use your mapping file. However, I still end up with not assigned SEED hits. Which files do you import? What are your LCA parameters?
The weird thing is that I got it to work yesterday but I can't reconstruct how I got it to work.
I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files:
Fields (separated by tab): query_id, subject_id, %_identity, alignment_length, mismatch, gap, query_start, query_end, subject_start, subject_end, evalue, bit score, subject_organism
01_contig00001_1_491_+ ref|ZP_10741618.1 29.94 157 0 0 9 157 4 158 1e-11 67.4 Brevibacillus sp. CF112
My file is in the exact same format.
This might sound stupid but could you also give me the other LCA parameters like Max Expected Top Percent ..
And what do you select on import -> I selected GI to Taxon mapping and Refseq to Seed mapping using your uploaded file.
Ok, now I am a bit more confused. In your first post you said your subject_id is in this format:
gi|503216142|ref|WP_013450803.1|
But now your format only has the RefseqID, eg:
ref|ZP_10741618.1
Was this the problem? My Output has the frist format you posted (GI-ID|Ref-ID)
Both work. The first one I utilized the taxon mapping using the gi numbers. In the second one it is not necessary, since there is an additional column with the taxon name.
LCA parameters: min score 50 max expected 0.01 top percent 10 min support percent 0.1 min support 1 lca percent 100 other options turned off
Today I have downloaded the last version of MEGAN. For Seed analysis I just have checked "use builtin refseq map".
I only set Min support to 1, since I am working with contigs. Take a look if your blast file is in the correct format. Here is a sample of the format of my files:
Fields (separated by tab): query_id, subject_id, %_identity, alignment_length, mismatch, gap, query_start, query_end, subject_start, subject_end, evalue, bit score, subject_organism
01_contig00001_1_491_+ ref|ZP_10741618.1 29.94 157 0 0 9 157 4 158 1e-11 67.4 Brevibacillus sp. CF112