Entering edit mode
8.0 years ago
agata88
▴
870
Hi all!
I am working on multiplex NGS results. I have vcf file which contains a lot of false positive variants. I know they are false positive because they obtain last nucleotide in read (from overlapping reads they are not present). I would like to label those variants. Do you know why I have those variants? I was performing primer trimming by cutadapt.
Those variants only appear in multiplex results. I don't see them in results created by probe (for the same sample).
Any idea?
PS. This could be caused by not performing "marking duplicates", but in multiplex it is impossible to do that.
Best,
Agata
Depending on the variant caller you used, there may be an indicator already there of the bias due to read position. If so, you can filter/mark by that.
I am using samtools mpileup to generate bcf files and then bcftools to create vcf files. Are those tools able to do that?
I know GATK at least used to determine this (the "ReadPosRankSumTest"), but I don't recall seeing samtools do it.
Thank you very much, I'll try this :)