Appropriate k-mer size for mappability mask
0
0
Entering edit mode
5.0 years ago
dthorbur ★ 1.9k

As part of MSMC, you need a mappability mask. However, for non-model organisms, you'll likely have to generate the masks yourself. I am using 3 spined sticklebacks, and my sequencing for each individual is comprised of 3 different libraries; 100bp reads with 140bp and 300bp insert sizes, and 50bp reads with 3kb insert sizes.

The program SNPable is conceptualised with single-end reads in mind, so deciding on which size k-mer to use is difficult. A guide I read used 250-mers for a single paired end library, though they didn't state the size of the reads nor the insert.

My question is simple, what do I need to consider when deciding what size k-mer to use? The mate pair library makes this particularly difficult, or so I have been led to believe at least. Any help would be greatly appreciated.

SNPable Mappability • 2.1k views
ADD COMMENT
0
Entering edit mode

Hi! Did you find a solution to this problem and would mind sharing it?

ADD REPLY
1
Entering edit mode

Hey, firstly I tested a few different k-mers to see the effect of changing size. But after looking through other papers with data I thought was similar I settled on k=100. You can check the preprint here.

ADD REPLY
0
Entering edit mode

Thanks a lot for sharing! Was the effect of different k-mer sizes huge?

ADD REPLY
0
Entering edit mode

Hey, I cannot remember and all those files are compressed on a backup server. If you have access to a HPC, it would be quite simple to test a series of sizes and have a look yourself. It will be different for each species, depending on things like repetitive content. Apologies I cannot be more helpful, but it was 3 years ago.

ADD REPLY
0
Entering edit mode

Hi, don't worry! Thanks a lot for your response, it helped me for sure.

ADD REPLY

Login before adding your answer.

Traffic: 2968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6