Lositan; spiky confidence interval, too many outliers
1
0
Entering edit mode
9.3 years ago
hern.moral • 0

Hello everyone,

I'm using Lositan with 734 SNP's for 72 individuals across a latitudinal cline, maybe structured in 2 main clusters and possibly in IBD.

If I use few iterations the envelope looks good (50K) (Fig1), but with more (>500K) looks terrible (Fig2), too ragged, and too many outliers. I've played with all parameters (#pops, force Fst, FDR,etc.) and keep getting the same,

Fig1: http://tinypic.com/r/2637kw4/8

Fig2: http://tinypic.com/r/msktmv/8

Any ideas? Can IBD have such an strong effect?

Thanks!

Hernan

outliers Fst lositan selection • 4.0k views
ADD COMMENT
0
Entering edit mode

So, are you splitting this in two populations? or more? What is the sample size reported when you load the data?

ADD REPLY
0
Entering edit mode

Hello Tiago,

Thanks for your quick reply. Yes my file has two populations, the sample size reported is 2 populations. I'm basing this in a preliminar structure and DACP analyses. However, I have played with the expected total pops quite a bit (2, 10, 20, 50, 100), and getting the same behaviour. Should I split the file in more "artificial" populations? I see how that might help, but then I guess the question is if the result will depend on how populations are assigned?

Cheers,

ADD REPLY
0
Entering edit mode

Just to be sure: on the bottom middle panel you have:

Mean Fst

Expected total pops

Mutation Model

Sample size.

Are you reporting the "Expected total pops" or the "sample size"?

ADD REPLY
0
Entering edit mode

Hello,

Mean Fst= 0.05

Expected total of pops, I have used different values and get same results= 2, 10, 40 and 100

Mutation model= Infinite sites

Sample size = 69 individual, 734 loci

Cheers

ADD REPLY
0
Entering edit mode

Some improvement when the original file is divided in 9 populations, see fig. However still some spikes in low He and Fst. This was using expected pop=20 and subsamplesize= 50, assumed Fst 0.06.

Fig: http://tinypic.com/r/2mrsnef/8

However, if I use default parameters --> expected pops = 9 and sub sample size = 10, I get many more "spikes"

Cheers

ADD REPLY
0
Entering edit mode

Dear all,

Greetings. I am trying to use LOGISTAN for my SNP dataset. I have tried several times but it seems does not run for my data. I have SNP data with VCF extension and I converted to Genpop format and run by program but it did not run!

I assume something wrong happened during file conversion or something else. Could you help me how can I proceed with this issue?

Thanks for your attention,
Amin

ADD REPLY
0
Entering edit mode
9.3 years ago
tiagoantao ▴ 690

Sorry for the late answer.

There is something that is not clear to me at this stage:

What value is lositan reporting for the sample size (is that 10?). This is different from the total number of simulated populations (which would start as 2 in your case). If I understand well, you have 76 samples. Are these approximately equally divided by both populations? Lositan should be reporting back a much bigger number for sample size (surely not 10 if you have 76 samples)

The effect could be explained by low sample size and possibly low number of populations.

ADD COMMENT
0
Entering edit mode

Hello, thanks Tiago!

Yes , 76 samples, they have similar number of samples across populations. These are screenshots of the GUI when data is uploaded.

When 2 populations (structure result) are defined in the genepop file: http://tinypic.com/r/nq1xk3/8

This is the one that gives the weird result: http://tinypic.com/r/msktmv/8

When 9 populations (according to geographic sampling) are defined in the genepopfile: http://tinypic.com/r/nytqb9/8

This gives a much better result: http://tinypic.com/r/2mrsnef/8

ADD REPLY
0
Entering edit mode

This is, I am afraid due to sampling effects related to the number of populations, there are two alternatives here:

  1. You think that you have 2 populations overall (i.e. you sampled 2 out of 2 in the wild), and in this case the CI that Lositan is computing is not trustable
  2. You think that you have more populations overall. In this case the sampling issue disappears.

So, Lositan is not appropriate if you have 2 populations in the wild, so I would recommend not to use it. But, if you are sampling 2 populations out of many, then you can probably still use it. Of course, how many populations you have might be complex to estimate, but if you are sure you have more than ~10 wild pops, the results should not vary much.

ADD REPLY
0
Entering edit mode

Thanks for the detailed answer Tiago. I think the key point lands on the difference between populations and genetic clusters. It seems fair enough to use sampling populations in this case, but it would be handy to be able to test few (3 or less) genetic clusters to each other, because is at that level that most of the species structure, and someone may want to look for selection at that level.

Cheers,
Hernan

ADD REPLY

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6