Can anyone give advice/opinions on my data quality? I have 64 samples from a single plant species of genome size 270Mb, paired-end RAD-sequenced with TaqI. The files I’ve been given are demultiplexed and range in size from 3.6Gb to 0.06Gb of 150bp reads. With, most worryingly to my eyes, 60% of files being at least an order of magnitude smaller than the largest file. TotalReads(M) ranges from 9.69 to 0.01.
I have been using Stacks, processing with process_radtags (-e taqI -r -c –q), file sizes drop to 0.35Gb – 0.007Gb. After further processing with denovo_map.pl (several param settings: -m 3 and -n/–M: 2/2, 4/4, 5/5, 5/3 8/8) I get 15000 - 20000 loci, and subsequently populations (-r 0.7; I don’t even set --max-obs-het) I get 100 - 200 loci (far too few!).
Can anyone suggest any way to improve the numbers of generated loci or offer any thoughts on whether sequencing protocols may have caused an issue?
Many thanks
Clive