Detecting Variants near repeats
1
0
Entering edit mode
9.9 years ago
Adrian Pelin ★ 2.6k

Hello,

I am using FreeBayes to quantify variation in a repetitive eukaryote. Normally when this is done, the analysis is focused on single copy regions, to avoid quantifying variation in repetitive areas.

What I found from past experience is that even after filtering out variants above a certain coverage, variants near repeats tend to always be false positives. Is there a way to exclude the analysis from areas near repeats?

Adrian

freebayes vcf bam repeats • 3.5k views
ADD COMMENT
0
Entering edit mode

I suppose I would have to create a repeat file myself, since I am working on a non model fungus that I just sequenced. Is there a way to intersect a VCF file with bedtools to remove SNPs close to annotated repeats?

Adrian

ADD REPLY
0
Entering edit mode

Yes. Bedtools supports multiple formats. Checks this:http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html

ADD REPLY
2
Entering edit mode
9.9 years ago

You can download bed files for various repeats from here.

Select your organism of interest and select "Repeats" in group. You can then combine multiple repeat bed files together using bedtools.

Then you can use bedtools intersect to find overlapping region between your SNP file (in some format compatible with bed tools) and the repeats bed file.

ADD COMMENT
0
Entering edit mode

I stumbled on this post because I am struggling with some repetitive regions containing SNPs that bcftools calls. Given that obviously the tool cannot distinguish between a repetitive region and not, unless you tell it to do so, is it common practice to literally trim out SNPs in repetitive regions after calling? Or people usually align DNA reads on hard masked DNA reference sequences to avoid even having mapping results there?

ADD REPLY

Login before adding your answer.

Traffic: 1551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6