Comparing genome assemblies of the same animal
1
0
Entering edit mode
5.6 years ago
a.rex ▴ 350

I have a genome of a species, which I have annotated in house. The N50 for this genome is around 41,000.

I have another, better genome, of the same species, which I have also annotated in house. The N50 for this is 4,000,000.

I suspect my poorer genome has split and incomplete transcripts, which gets resolved with my better genome.

Does anyone have any tips on how I can find this split instances that are resolved with the better genome?

Assembly • 1.4k views
ADD COMMENT
0
Entering edit mode

I have two protein fasta files of the same animal. One is an assembly with a small N50; the other pacbio one has a larger N50.

I wish to blastp the bad genome against the better one; after performing this I want to extract out the query length and hit length. I will then blastp the better one against the worse one.

So when I blast the bad genome against the pacbio one, I should get the query length being much smaller than the hit length. The opposite should be true when I blast the pacbio genome against the bad genome.

How can I extract out the query and hit lengths from the blast results?

ADD REPLY
2
Entering edit mode
5.6 years ago

Mick Watson came up with this simple test for bacterial genomes: http://www.opiniomics.org/a-simple-test-for-uncorrected-insertions-and-deletions-indels-in-bacterial-genomes/

With an N50 of 4,000,000 I don't expect you to have a bacterial genome :) However I don't see why this simple test wouldn't work when you'd blastp your 'poor' genome's proteins with your 'better' genome's proteins as database

ADD COMMENT
0
Entering edit mode

Yes, I suppose I can adjust the Snakefile?

ADD REPLY

Login before adding your answer.

Traffic: 1493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6