Question

Why Are Monomorphic Loci Excluded From Analysis?

5

Entering edit mode

10.6 years ago

714 ▴ 110

As the title suggest, I was wondering why it is a good idea to exclude monomorphic loci from SNP analysis. How would including them affect a PCA plot for example?

snp pca • 20k views

ADD COMMENT • link updated 10.6 years ago by blacktomato27 ▴ 70 • written 10.6 years ago by 714 ▴ 110

score 13 · Answer 1 · 2013-08-28

13

Entering edit mode

10.6 years ago

Fabio Marroni ★ 3.0k

In my understanding, monomorphic means something that appears in just one state (or form), in contrast to polymorphic that means something that appears in more than one form. SNPs are by definition polymorphic. A monomorphic site is one site in which all the individuals have the same form (genotype). It is a good idea to exclude it from analysis because it gives no information. Please, note that you implicitly always exclude from analysis the majority of the 3 billion positions of the human genome for which you find no variation.

ADD COMMENT • link 10.6 years ago by Fabio Marroni ★ 3.0k

1

Entering edit mode

Would there be any harm in keeping monomorphic loci in the dataset given that they do not seem to contribute to any of the variation that we might see?

ADD REPLY • link 10.6 years ago by 714 ▴ 110

6

Entering edit mode

As Josh already said, it does no harm in terms of results (they are uninformative), but it wastes computer time.

ADD REPLY • link 10.6 years ago by Fabio Marroni ★ 3.0k

score 5 · Answer 2 · 2013-08-28

5

Entering edit mode

10.6 years ago

Josh Herr 5.8k

You would inflate your SNP numbers and misrepresent your data.

How would you differentiate between a sequencing error, one-off single mutation or transcription error, and a bona-fide SNP? SNPs are found across individuals in a population -- monomorphic loci represent one individual's nucleotide state and may be the result of errors across numerous levels. When you see a SNP in multiple individuals you can infer it is not from sequencing error or a mutation found in a single individual.

ADD COMMENT • link 10.6 years ago by Josh Herr 5.8k

0

Entering edit mode

That makes some sense, howeforver I'm afraid I don't quite understand all of it. For example, if 100 individuals were gentyped at loci A-D and all were homozygous C/C at locus A, then why would one exclude locus A from the dataset and subsequently, analysis?

ADD REPLY • link 10.6 years ago by 714 ▴ 110

1

Entering edit mode

In your example, locus A would be not informative and it would be pointless to leave that nucleotide alignment position in the analysis -- it would provide you with no information and would also waste compute time (meh, probably negligible). You would want to remove uninformative characters -- this would include non-variable sites as well as monomorphic sites (one "mutation" and not a SNP) or highly variable sites.

ADD REPLY • link 10.6 years ago by Josh Herr 5.8k

score 0 · Answer 3 · 2013-08-29

0

Entering edit mode

10.6 years ago

blacktomato27 ▴ 70

Hi, How to consider heterozygous allelic state of parents in polymorphism analysis, for example SNP1 SNP2 SNP3 p1 AA AT AA p2 AA AA TT here i want to see polymorphism between p1 and p2, This is my expected results SNP1 SNP2 SNP3 p1 mono ? poly p2
Thanks in advance

ADD COMMENT • link 10.6 years ago by blacktomato27 ▴ 70

0

Entering edit mode

I don't quite understand your question? (This isn't an answer by the way, so it should be placed as an additional question in a new thread). Are you asking how to differentiate between heterozygosity and SNP polymorphisms?

ADD REPLY • link 10.6 years ago by Josh Herr 5.8k