Analysing two SNPs and their association with phenotypes, and co-occurence
1
0
Entering edit mode
5.4 years ago
abdi_g96 ▴ 10

Hi guys, I have a few questions. I am a med student doing research so I have little to no experience with the process of what I am doing so would love some help.

I have genotyped around ~1200 subjects for two SNPs. The two genes are involved in the skin barrier, and I am interested in their association with food allergy as well as seeing if there is a difference in association with transient food allergy and persistent food allergy (i.e. whether children grow out of them or not). I have assigned them a value of 0 or 1, describing if they have the mutation or not. I have then gathered the phenotype data for the same subjects and have their allergy status as 0 or 1. In another column I have transient and persistent status as 2 or 4 respectively.

One of my supervisors ran an association test using a software called PLINK. He checked the association of one SNP with FA status and found no association. But he is not willing to help me check the co-occurence or cumulative effect of the two SNPs so I am at a loss for figuring out how I can do this? What software do I use, what statistical test do I run, and what data do I input?

Also, does anyone here have any experience using PLINK? How do I display the results of association tests in summary tables?

Many thanks.

SNP co-occurence cumulative association PLINK • 1.1k views
ADD COMMENT
0
Entering edit mode
5.4 years ago

Can you elaborate on what exactly you want to do? Do you want to test every SNP as pairs?; or, do you have specific pairs of SNPs (e.g. statistically significant SNPs from the first analysis) that you want to test? There are easy ways to do this, but just need to be sure about what you want to do.

On a very simple level, you could just build your own regression model as:

glm(AllergyStatus ~ SNP1 + SNP2, data = data, family = binomial(link = 'logit'))

Then, there are ways to extract a single value for this model that can allude to its 'predictive' strength.

ADD COMMENT
0
Entering edit mode

So these SNPs have been highlighted by other studies as possibly having an implication in food allergy (FA). I want to check the association of each of these SNPs with FA and transient/persistent FA (which I have phenotype data for) individually. Then I want test them as pairs to see their association with FA and transient/persistent FA. Does that make sense?

Another question I also have is, is there any use in specifically using PLINK as my analysis tool if I am only looking at two SNPs? A software like SPSS is much more intuitive to use but my supervisor has gone with PLINK and I don't understand why. Is there an advantage to using it?

Many thanks

ADD REPLY
0
Entering edit mode

I see. So, how many independent tests are we talking about? - a few?; tens?; hundreds?; thousands? ... millions?

Yes, SPSS and R can do most / all of what PLINK does, however, PLINK has been the leader in GWAS analysis for many years, and is still under further development as PLINK2. The most basic association test just derives a Chi-squared p-value, like this: A: SNP dataset and Z Score

However, PLINK does many other analyses like family tests, linear/logistic regression, dosage analyses, LD, etc.

So, you can indeed use SPSS for this if that is more comfortable for you. The code that I provide above is obviously R. From the model, you can obtain a Wald p-value for the 2 SNPs combined, and can also obtain r-squared / r-squared shrinkage via cross validation, and derive AUC from ROC analysis using the model.

I think that your main part may be in simply defining all of the model formulae (?). I had to do something similar, by the way, while I was working in USA, i.e., perform millions of tests using GWAS data with complex formulae, and 'parallelise' it so that it didn't take weeks to finish. It resulted in the creation of this R package (on Bioconductor): https://github.com/kevinblighe/RegParallel

As you are SPSS, perhaps that is not of much use.

ADD REPLY
0
Entering edit mode

What do you mean by how many independent tests? The sample size is ~1500 if I've understood you correctly.

"The code that I provide above is obviously R. From the model, you can obtain a Wald p-value for the 2 SNPs combined, and can also obtain r-squared / r-squared shrinkage via cross validation, and derive AUC from ROC analysis using the model."

Can you explain this passage a bit more? I am what you might call, statistically illiterate hahah. I understand that up until 'obtaining a Wald p value' which would tell me if the two SNPs combined have a statistically significant effect, but can you explain the rest about r-squared, AUC and ROC etc?

And that sounds impressive, the reason I asked about the use of SPSS over PLINK is that this is not a full blown GWAS study but rather a look at two specific candidate genes in a sample population, which have previously been highlighted as significant in other GWAS studies. I think my supervisor is used to working with PLINK in his other GWAS studies but as I am not an expert I find it more confusing and may resort to SPSS instead if it has the same functionality.

ADD REPLY
0
Entering edit mode

To learn about the different ways of performing extra tests with regression models, take a quick look here:

The main one is cross-validation. For linear models, this will give you a 'shrunk' r-squared value. For logistic and other models, this will give you a 'delta' value (delta should be small if the model assumptions are sound).

You may have to search more about all of these terms.

For practical usage of the Wald test for obtaining a p-value for 2 or more model terms, take a look here: Differential Expression analysis using Wald-Test

I am not a statistician; however, as you go through a career in bioinformatics and engage learning, one invariably picks up an understanding of which tests to use in which situations.

ADD REPLY
1
Entering edit mode

Thank you so much for this help! I will make an attempt of things and reply here if I get stuck.

ADD REPLY

Login before adding your answer.

Traffic: 2689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6