I want to convert my pedmap (or bed) files into the format required by snp.plotter (R package to create plots of p-values using single SNP and/or haplotype data). The documentation is terrible, so here is what I got.
The SNP file
From the doc:
"SNP.FILE includes four necessary columns ASSOC, SNP.NAME, LOC, and SS.PVAL corresponding to positive or negative association (indicating protective or susceptibility alleles, a SNP label, the location, and a p-value for each SNP"
Columns 2 and 3 could be taken from .MAP file directly. The SS.PVAL could be obtained after running PLINK with --hardy.
But how about the ASSOC (+ or -)? Where to get the negative or positive association data for each SNP?
The HAP file
From the doc:
HAP.FILE: HAP.FILE includes three necessary columns ASSOC, G.PVAL, and I.PVAL corresponding to positive or negative association (indicating protective or susceptibility alleles, a global p-value and an individual p-value for each haplotype followed by a set of columnns of SNPs with corresponding haplotypes. Haplotypes are presented in a step-wise fashion with the major allele given as 1 and the minor allele as 2; haplotype variants for a set of SNPs should be grouped. SNP labels in HAP.FILE must be the same as in SNP.FILE, and only SNPs with corresponding haplotypes need to be included. In the figure, unfilled symbols connected by solid lines are used to indicate global haplotype p-values, (a circle is used if no symbol is specified for the dataset). Unfilled and filled symbols are used to indicate alleles 1 and 2, respectively connected by solid lines and dashed lines for positive and negative association (indicating susceptibility or protective haplotypes) when using indivudal haplotype p-values.
How they get the Global PVAL and Individual PVAL for haplotypes?
Also for each SNP in columns they put
- Major allele = 1
- Minor allele = 2
- Nothing otherwise
If this info matches the .hwe I don't get it, how they recode the A1 and A2 columns?
The GENOTYPE file
From the doc:
GENOTYPE.FILE: GENOTYPE.FILE is a modified Linkage PED file. Each row should have the following information: family ID, individual ID, father ID, mother ID, sex, and affection status followed by marker loci coded as binary factors
I guess this could be obtained after a --recode12 from the PED.
You are free to call stupid to anyone, but my question is about how to interpret the data they ask for the input file (my questions are pretty clear IMHO).
The documentation says the PALETTE file is optional, they don't say anything optional about the GENOTYPE file.
AFAIK the web LocusZoom only allows to select against the Human genome builds, I am not working with Human species. Anyway, I am interested in the HaploView-like plot.
I wasn't calling you anything, just telling you that you could have looked more carefully into the documentation. In R do
?snp.plotter
, you'll see that for a "standard" Manhattan plot, you only need theSNP FILE
. I can help you convert your output file from plink to the input file for snp.plotter if you tell me what type of association you're conduction. Is it for a quantitative trait? For instance--linear
? In that case you're doing a regression analysis and PLINK generates an output file named *.assoc. Since you wanna plot the result from this as a Manhattan plot you need the P-values from this analysis, not from a HW test as you proposed. TheASSOC
files refers to the sign of the beta coefficient in this regression, in the *.assoc file this is theBETA
column. All the other information that you will need to use is there. Hope this helps!Now that's a better answer :).
I am not doing any association since I don't have traits (at least yet, I don't do experiment design). I am exploring if snp.plotter visualizations could be nice alternative to HaploView and other haplotype visualization packages.
As far as I can see, the mandatory SNP.FILE values are obtained by association analysis,if you don't do association then you cannot use this package (if that's true then that's the reason why I call documentation terrible, they should made clear that to not waste the people's time).
If you're interested in just plotting an LDmap I would recommend then: https://cran.r-project.org/web/packages/LDheatmap/LDheatmap.pdf