Question

finishing a genome from assembly of contigs

0

Entering edit mode

7.5 years ago

silvia.caprari84 ▴ 60

I all, I am new in analysing sequencing results coming from ngs and I sent some clinical bacterial isolates (they are likely to have plasmids)to be sequenced with Illumina. I anticipate that I am completely new with the terminology , methodology and everything else..I got files named "reads" and files"contigs" from the company. so, if I understood correctly the contig files are the reads assembled, right? and I shouldn't need to assemble on my own if they are already assembled by Illumina, right?Correct me if I go wrong, please What if I wanted the "finished version" of a genome(I mean the chromosome and the plasmids separate and ready to be deposited..)? should I assemble the contigs all together?.. and how do you do it?

Also, could you have more contigs with the same sequence?could it be a result of the overlapping methodology performed by the sequencing?

I also noticed that if I run Blast by using a sequence of a known protein as a query against a file containing all the contigs, the known sequence matches more contigs, and most often the same sequence in different contigs can be different in a few nucleotides that result in a different identity percentage with the known sequence...why does it happen? if there are more contigs for a same sequence, should not this latter be exactly in the different contigs? Is this due to the sequencing methodology?

Sorry for my questions..I am completely new with terminology, methodology etc..and I have no one to ask at the moment.

Thank you so much again.

Silvia

next-gen sequencing genome blast Assembly • 2.7k views

ADD COMMENT • link updated 7.5 years ago by sidrairshad29 • 0 • written 7.5 years ago by silvia.caprari84 ▴ 60

1

Entering edit mode

If you want to have help here, I think it is better if you ask only one question and not so many. Anyway, you can get a look at this: Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data, Completing bacterial genome assemblies: strategy and performance comparisons

ADD REPLY • link 7.5 years ago by dago ★ 2.8k

score 0 · Answer 1 · 2016-10-12

0

Entering edit mode

7.5 years ago

silvia.caprari84 ▴ 60

yes, really sorry about that. then can I just ask just your opinion about this: if I run Blast by using a sequence of a known protein as a query against a file containing all the contigs, the known sequence matches more contigs, and most often the same sequence in different contigs can be different in a few nucleotides that result in a different identity percentage with the known sequence...why does it happen?I would expect no differences in nucleotides. thanks

ADD COMMENT • link 7.5 years ago by silvia.caprari84 ▴ 60

1

Entering edit mode

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

score 0 · Answer 2 · 2016-10-12

It can be that the gene you are using as query has similarity to multiple genes in the genomes, either because the gene is repeated or because there are multiple version of it. On the other hand you should check how clean your genomes are. This is usually done considering the presence of unique genes for the specific phylogenetic group your bacteria belongs to. I personally use CheckM. The point is that it could simply be that you have contamination, meaning other sequences other then your genome of interst.

score 0 · Answer 3 · 2016-10-12

0

Entering edit mode

7.5 years ago

silvia.caprari84 ▴ 60

Thank you very much dago

ADD COMMENT • link 7.5 years ago by silvia.caprari84 ▴ 60

0

Entering edit mode

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

score 0 · Answer 4 · 2016-10-15

0

Entering edit mode

7.5 years ago

sidrairshad29 • 0

Dear All,i I have somewhat similar queries like Silivia, I have sequenced my bacterial strain by illumina Hiseq, they send me a file having reads, Then i generated contigs using VELVET. Now i have 149 unordered contigs. Could you please guide me how i could get complete genome out of it. Also my draft genome is annotaed. Is there any need for complete genome for phylogenetic and comparative genome analysis?

ADD COMMENT • link 7.5 years ago by sidrairshad29 • 0

0

Entering edit mode

You should open a new post and not add question to old questions. You cannot have complete genome from a WGS. You have contigs that you might or not order using a reference genome. In any case having a complete genome requires some PCR work, but I would say that in most of the cases is not necessary in comparative genomics studies.

ADD REPLY • link 7.5 years ago by dago ★ 2.8k

0

Entering edit mode

so that means i can use my contigs for phylogenetic analysis and comparative genome analysis as they are?

ADD REPLY • link 7.5 years ago by sidrairshad29 • 0