Question

How to detect outliers from either (a) SNP-Fst or (b) Window-Fst distributions?

0

Entering edit mode

5.1 years ago

serpalma.v ▴ 80

Hello

I want to find the SNPs that could be responsible for the phenotype differences observed between three populations. For that I computed Fst (weir and cockerham) using vcftools.

One population reflects the founder population (line0) from which the two populations were selected (line1 and line2), each one for a different trait. The phenotypes for each line are highly divergent.

Computing per-SNP Fst produces the following representative distributions .

Computing windowed (window = 500kb; slide = 250kb; min #SNPs=20) Fst produces the following representative distributions .

First, line1 vs line2 yields a different Fst distribution compared to (line1 | line2) vs line0.

Second, window Fst calculation (mean) yields smoother distributions.

I would like to seek advise on the following:

(1) how to define outliers considering the two types of observed Fst distributions?

(2) Is windowed Fst more suitable to identify outliers?

(3) How to define the size and step of a sliding window? (what I choose for this example is based on a similar study, but I guess it might require optimization)

(4) Do I need to do some type of SNP pruning (these SNPs are derived from WGS variant discovery analysis following GATK best practices)?

Fst vcftools • 2.0k views

ADD COMMENT • link updated 5.1 years ago by h.mon 35k • written 5.1 years ago by serpalma.v ▴ 80