How to identify the contigs for each predicted cds sequences?
2
0
Entering edit mode
7.0 years ago
mirza ▴ 180

Hi, I have an assembled genome (contigs) and it's cds & proteins predicted by Augustus. For my wet lab work, I want to track these cds back to the genomic contigs, I mean I want to identify the contig id for each predicted cds. I tried blast+ and run both blastn & discontinuous megablast but they give output for a small no. of cds only. Can anyone suggest what else I can do to identify the contigs for my cds sequences?

genome augustus output cds contigs • 2.7k views
ADD COMMENT
0
Entering edit mode

have you try loading gff and genome in IGV? (to visualize) and also if you makes the prediction (with prodigal for example) that information is included in the name for each cds.

ADD REPLY
1
Entering edit mode
7.0 years ago
Joe 21k

If you take a multifasta of contigs (like the output you'd get from SPAdes and other assemblers), you can then annotate them however you like so long as each contig is annotated (the file will probably look like many Genbanks concatenated together). I would normally use prokka for this (if you're doing microbial/prokaryotic work), but the idea still stands.

You can then open the multigenbank file in Artemis Genome browser and find the CDS you want by it's annotation, and it will show you which contig it belongs to.

What you haven't told us is how many genes you want to do this for though. For wet lab work I'm guessing not that many, so you can probably get away with doing this 'by hand'. If you want a programmatic approach though you'd have to go about it another way (e.g. biopython/bioperl genbank parsing)

ADD COMMENT
0
Entering edit mode

Hi, actually I need to do this for several genomes, all fungal (actually different strains or isolates of a fungus) and therefore, the no. of genes will be few hundreds (and about 100 genes per genome). So, I was hoping to avoid any manual methods. We have annotated using blast2go (Prokka doesn't work for us). I didn't really understand what you are saying, can you please describe it in detail and direct me towards any related tutorial or paper or post etc.

ADD REPLY
1
Entering edit mode

The ARTEMIS genome browser can be found here: http://www.sanger.ac.uk/science/tools/artemis If you have an annotated contigs file, I'm guessing it looks like several genbanks, one after another (one for each contig). Hopefully this will make sense: in the image below, I've read in to Artemis an annotated contigs file. If I then look for a gene by name I can also find out what NODE (contig) the gene is on, as they are also displayed as features. In this case, the cstA gene is found on contig 19. This explains the approach I was alluding to by doing everything manually. I'll have to think some more about how you'll go about this programmatically.

enter image description here

ADD REPLY
0
Entering edit mode

Thanks for explaining. If I understood correctly, I'll need to upload the gff file for the contigs in Artemis? Also, I think the same can be done in IGV (I have just downloaded it and both the tools are new to me)? I'll really be grateful if you can think of something else, doing it programmatically.

ADD REPLY
0
Entering edit mode
7.0 years ago

I think BlastX would work better for this.

ADD COMMENT
0
Entering edit mode

No, it won't. I need nucleotide sequences. but thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6