Question

Which one do I need to consider single-read or multi-read mappability bigwig file for creating mouse mappability file?

1

Entering edit mode

6.2 years ago

bioinforesearchquestions ▴ 370

Hi,

I am using HMMcopy to generate copy number alterations for my samples.

I am trying to create mm9 or mm10 mappability wiggle file from mouse bigwig file.

I found the below mappability files from this article "Mappability of the mouse and human genomes and methylomes with Umap and Bismap"

I am also generating the bigwig file using HMMcopy generateMap.pl). Meanwhile I want to use the available mappability files.

What are the factors to be considered while creating mappability wig file?

Which one do I need to consider, single-read or multi-read mappability bigwig file?

Which of the kmer should I use among 24, 36, 50 and 100?

Umap: Unique mappability of the genome

Description These tracks indicate regions with uniquely mappable reads of particular lengths.

Umap single-read mappability

Umap S24: Uniquely mappable regions with a read length of 24 nucleotides

Umap S36: Uniquely mappable regions with a read length of 36 nucleotides

Umap S50: Uniquely mappable regions with a read length of 50 nucleotides

Umap S100: Uniquely mappable regions with a read length of 100 nucleotides

Umap multi-read mappability

Umap M24: Multi-read mappability with a read length of 24 nucleotides

Umap M36: Multi-read mappability with a read length of 36 nucleotides

Umap M50: Multi-read mappability with a read length of 50 nucleotides

Umap M100: Multi-read mappability with a read length of 100 nucleotides

You can use these tracks for many purposes including filtering unreliable signal from next generation sequencing assays.

Umap single-read mappability track marks any region of the genome that is uniquely mappable by at least 1 k-mer. To calculate the single-read mappability, you must find the overlap of a given region with this track. Umap multi-read mappability track represents the probability that a randomly selected k-mer which overlaps with a given position is uniquely mappable.

For greater detail and explanatory diagrams, see the preprint, the Umap and Bismap project website , or the Umap and Bismap software documentation.

Track format Single-read mappability tracks: bigBed 6-column format

Multi-read mappability tracks: bigWig

Mappability mouse mm10 mm9 HMMcopy • 2.7k views

ADD COMMENT • link updated 5.7 years ago by mehran.karimzade ▴ 220 • written 6.2 years ago by bioinforesearchquestions ▴ 370

score 4 · Answer 1 · 2018-08-24

I hope you found the answer to your questions by now but since this post still remains unanswered:

Mappability of the genome depends on how you sequence it. If it's paired, unpaired, and what read lengths you used for sequencing.

For example, if you used unpaired 100 bp reads to sequence your genome, you can use mappability files for 100-mers. If you used paired-end sequencing with 100 bp, however, the problem is more complicated. Because you have sequenced the genome with fragments ranging from 100 bp to approximately 300 bp, you'd need to consider mappability of 100 bp (worst case scenario) and 300 bp (best case scenario) for different regions.

Now on the difference between single-read and multi-read mappability files. Single-read mappability for 100-mer is a BED file which annotates any region of the genome which is uniquely mappable by at least one 100-mer.

The problem is, some regions might be uniquely mappable by all possible 100-mers, and some regions might be uniquely mappable with just a few 100-mers. To distinguish these, you can use multi-read mappability files. The multi-read mappability file quantifies the fraction of 100-mers overlapping a nucleotide which are uniquely mappable.

This preprint describes everything in more detail: https://www.biorxiv.org/content/early/2017/06/04/095463

You can also post your questions to Umap's own mailing list: https://groups.google.com/forum/#!forum/ubismap