How can an assembled genome have Ns in it?
0
0
Entering edit mode
7.3 years ago
predeus ★ 1.9k

Hello all,

maybe it's a silly question, but I have very little experience with genome assembly, so I was hoping somebody would help me out. A colleague of mine has pointed out to certain number of N nucleotides in continuous parts of some genome assemblies (as in the middle of a chromosome). They are not present in human or mouse assemblies, but are seen quite often in other genomes.

Now what is confusing to me is those are not the hard-masked versions of the genomes - or at least so they said. Those are un-masked versions.

Could you have a certain number of Ns in an assembled scaffold? How could you know the number of Ns for sure if you never got the sequence?

Thank you for any input

ngs wgs genome assembly • 2.9k views
ADD COMMENT
0
Entering edit mode

Human genome has quite some parts with "N" nucleotides, mainly repetitive content such as telomeres and centromeres.

ADD REPLY
0
Entering edit mode

Yeah, I know that - I know that they are usually masked. That I can easily understand.

What I don't understand is that how can you have undefined sequences of known length in the assembly.

If you have stretches that are very repetitive and cannot be assembled, you have a hole in the assembly there, don't you? You can't just put the two scaffolds together and put some N's in between, since you won't know the length of the linker.

That's what I don't get.

ADD REPLY
4
Entering edit mode

If you have paired end or mate pair reads that have alignments on either side of the gap, and since you know the expected size of the distance between pairs, you can use that to infer gap size and insert a correct number of Ns. Otherwise, unknown gap sizes https://www.ncbi.nlm.nih.gov/genbank/wgs_gapped/

ADD REPLY
0
Entering edit mode

Thank you for the link, that answered it for me!

ADD REPLY
2
Entering edit mode

It depends on the stage of your assembly. At the contig stage, there should be no N as far as I am aware. At the scaffold/pseudomolecule stage, you will have blocks of Ns between contigs (scaffold stage)/scaffolds (pseudomolecule stage), when you know the order of the contigs/scaffolds (e.g. from long mate-pair libraries) but were not able to actually overlap them.

ADD REPLY

Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6