Question

increase in SNP frequency towards the 3 end of reads

0

Entering edit mode

8.6 years ago

c.gubili • 0

Hi all,

I am new in stacks and I would like to ask a general question. I have been trying to assess the increase in SNP frequency towards the 3' end of my reads and I would like to hear your suggestions.

First, reads were demultiplexed based on their individual barcodes (8bp) and quality ﬁltered using process_radtags. Then, I run preliminary analyses (denovo, default parameters) with no further trimming/ﬁltering and I found an increase in SNP frequency towards the 3' end of the reads. I read that the latter probably corresponds to spurious polymorphisms resulting from increasing amounts of sequencing errors. I decided to trim 9bp from the 3' end, and ended up with 83bp.

I rerun denovo, checked the number of SNPs and produced a second graph for my new "83bp" dataset. However, instead of seeing an even number of SNPs across bases, there has been an increase of SNPs at the end of the 3' end (about 400 SNPs for the last 5 bp). I trimmed 5 more bp for a second time, and dropped to 78bp. But I get the same trend. Any suggestions?

Thank you in advance,
Chrysa

next-gen SNP • 1.5k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by c.gubili • 0

0

Entering edit mode

What was the initial quality threshold?

ADD REPLY • link 6.7 years ago by deepti1rao ▴ 50

Ram · Answer 1 · 2015-09-28

0

Entering edit mode

8.6 years ago

Brian Bushnell 20k

Variants near the ends of reads are not very reliable, even if the read is error-free. Indels will come out looking like SNPs, for example. It's best to give all variants within X bp (where X is ~10) a lower confidence or just ignore them.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Brian Bushnell 20k