Hi all,
I am new in stacks and I would like to ask a general question. I have been trying to assess the increase in SNP frequency towards the 3' end of my reads and I would like to hear your suggestions.
First, reads were demultiplexed based on their individual barcodes (8bp) and quality filtered using process_radtags
. Then, I run preliminary analyses (denovo, default parameters) with no further trimming/filtering and I found an increase in SNP frequency towards the 3' end of the reads. I read that the latter probably corresponds to spurious polymorphisms resulting from increasing amounts of sequencing errors. I decided to trim 9bp from the 3' end, and ended up with 83bp.
I rerun denovo, checked the number of SNPs and produced a second graph for my new "83bp" dataset. However, instead of seeing an even number of SNPs across bases, there has been an increase of SNPs at the end of the 3' end (about 400 SNPs for the last 5 bp). I trimmed 5 more bp for a second time, and dropped to 78bp. But I get the same trend. Any suggestions?
Thank you in advance,
Chrysa
What was the initial quality threshold?