TCGA SNP analysis using manhattan plot and QQ plot
0
0
Entering edit mode
5.9 years ago
DanielC ▴ 170

Dear Friends,

I am new to TCGA data analysis. I would really appreciate your suggestions on these questions:

a) From TCGA vcf files, I am looking to generate manhattan plots and qq plots to detect the association of SNPs with the traits? I know to generate manhattan plots we need these info:

CHR: chromosome (aliases chr, chromosome)
BP: nucleotide location (aliases bp, pos, position)
SNP: SNP identifier (aliases snp, rs, rsid, rsnum, id, marker, markername)
P: p-value for the association (aliases p, pval, p-value, pvalue, p.value)

"CHR", "BP, "SNP" are in the vcf files, so where to get the "P-value" from?

And for QQ plots also where to get the observed and expected p-value?

b) What type of plot should be generated to best present the number of variants for each tumor in each cancer type in vcf files? Could you please let me know where to find the information of tumor and the cancer type for SNPs in vcf files?

Thank you very much! DK

SNP TCGA plots • 2.5k views
ADD COMMENT
0
Entering edit mode

As far as I know, TCGA does not calculate association p-values. Although there may be independent resources where that information is available.

ADD REPLY
0
Entering edit mode

Thanks! can you let me know where the p-values can be obtained from for each SNPs?

ADD REPLY
0
Entering edit mode

Igor was just saying that such data may exist... somewhere. I have never seen such data, but it could exist. One resource that may have something similar is cBioPortal.

May I ask what you are trying to do? Manhattan plots were mainly used for GWAS, not cancer data. Of course, the can be used to plot anything. I believe that we have already identified the mutational landscape of tumours (?)

ADD REPLY
0
Entering edit mode

Thanks! Yes, am doing GWAS study, and I have vcf files to perform the above mentioned studies. Now, am trying to plot manhattan plot and QQ plot to detect the association of SNPs with the traits. Since am lacking p-values for the SNPs I am not able to plot them. Please let me know if am clear and if you know how one can proceed with these plots?

ADD REPLY
0
Entering edit mode

So, you need to know how to perform an association test from the VCF stage? What I would do is convert the data into plink format, and then do the association testing there. I have done this man times in the past, in fact.

Another program, SnpSift CaseControl, can perform the testing and encode the p-values within your VCF, which may be easier for you.

ADD REPLY
0
Entering edit mode

Thanks! Yes, so, I need to use plink to get the p-values of the SNPs from the vcf files, right? Could you please guide me to the plink steps source where I could learn on how to perform this? I am new to this. Thanks for understanding.

ADD REPLY
1
Entering edit mode

Sure, you just need the --vcf flag: How unphased VCF is converted into ped file?

However, when doing this, plink apparently distorts the order of the samples in your VCF. So, you should 'fix' the ordering of your samples from the very first step and then supply a custom FAM file or all analyses. I cannot stress enough how important this is because otherwise you will be comparing sample groups that are not reflective of the actual groupings that you want.

What I said may not make much sense right now, but just go step by step and be 100% certain at each step that what you believe is happening is happening. It's easy to convert any VCF to plink, but not easy to maintain sample groupings.

See here: linkage disequilibrium analysis

ADD REPLY

Login before adding your answer.

Traffic: 1867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6