Hi all,
It's been a while since I've used plink, and either I've lost the knack in my old(er) age or somethings changed (or a combo of both). I'm using plink version 1.90 (what's installed on high performance computer here) and I'm just banging my head against the wall, so hoping someone can point out what I've overlooked.
I have 218 isolates, 88 cases, 130 controls (case=2, control=1 in column six of .ped file). Here's an example of the .ped file:
Ind1 Ind1 0 0 0 2 A A 0 0 0 0 0 T T G....
Ind2 Ind2 0 0 0 2 A A 0 0 0 0 0 G G C....
Ind3 Ind3 0 0 0 1 0 0 0 0 0 0 0 T T G.....
And so on.
I ran the following command:
plink --file GWAS --allow-extra-chr --assoc --allow-no-sex
And the resulting .assoc file looks like this:
CHR SNP BP A1 F_A F_U A2 CHISQ P OR
26 . 6996 0 0 0 1 NA NA NA
26 . 7316 0 NA 0 1 NA NA NA
26 . 7321 0 NA 0 1 NA NA NA
26 . 7392 0 NA 0 1 NA NA NA
As you can see, it's not calculating any p-values.
Can anyone provide me with clues as to what I'm missing? I tried converting all my data to binary (i.e. 0 for ref, 1 for a SNP) to eliminate the possibility of anything other than biallelic sites, but got the same result. Any help is appreciated.
Check you have a phenotype - case control status, and check frequency of SNPs
--freq
?Hiya,
Phenotypes seem ok:
I think the clue may be in the how the SNP are in the .ped file. Output was 'Total genotyping rate is 0.188953' but the resulting .frq file was perhaps a bit weird? It looks a little like this:
CHR SNP A1 A2 MAF NCHROBS
26 . 0 A 0 186
26 . 0 C 0 2
26 . 0 T 0 8
26 . 0 A 0 4
26 . 0 G 0 6
26 . 0 T 0 8
26 . 0 G 0 8
26 . 0 T 0 8
26 . 0 G 0 4
26 . 0 T 0 2
26 . 0 A 0 2
26 . 0 T 0 2
26 . 0 T 0 28
I would expect A1 to be a SNP, and there to be something in MAF, right?
I'd filter by missingness, usually at 95%, then you wouldn't have any data left.
And the MAF is 0 for all (in this example subset). These would have to be filtered out in QC step: monomorphic SNPs
Ok I think I know what the problem is - when converting from vcf to .ped, the SNPs are kept as nucleotides (A,C,G,T) but the reference is left as '0'. So I have a mixture of 0 and nucleotides in the .ped file.
Any idea on how to convert from vcf to bed and retain the reference allele, instead of getting '0'?
plink —vcf should work.
It doesn't retain the reference - it just puts that as '0' and keeps the SNPs as nucleotides. This is how I've been making my .ped files
Can you post the .log file from your "plink --vcf" run, and the first few nonheader lines from your VCF?
Sure! Here's the
.log
output:And here's the first non-header line from the VCF (sorry, there's 218 isolates in here so its a lot):