Hey guys,
I am running MatrixEQTL on the BRCA TCGA exome sequencing dataset containing 83070 somatic SNPS extracted from the MAF files of 976 primary tumour cases. I am testing the association of these SNPS to 22 phenotypes calculated from the gene expression matrix for the same 976 TCGA cases. My SNP input to MatrixEQTL is a 83070 x 976 binary matrix with a 1 for if the sample contains the SNP at a given position and 0 if it does not. My gene expression file is a 22 x 976 matrix with standardised values for the phenotype. I have 6 covariates for each of the cases that I am including in the analysis. I am running the analysis in R. After running the analysis I get a thousand or so associations and several of them are extremely significant (10^-308). The issue is that the same phenotype will have say, 30 snps associated to it and all of which will have the same exact p value. I looked further into this and realised this is because only one sample has the SNP in question and thus the p values have become discretised. Is this association false given that only 1 sample has the series of SNPs which is associated with a given phenotype? My SNP matrix is very very sparse and thus the majority of SNPS are at most only carried by one sample.
Thanks for your help.