I have gone through some of the publications and done a literature survey, I have a basic doubt and seek guidance from the Computational Genetic persons or Mol. Biology persons.
What is Ref and Alt ? Does all variations are considered as mutations ? It may be likely pathogenic or pathogenic or neutral but it is wise to call them mutations ? After doing annotation using Annovar, some REF and ALT are non-variations (G > G) but they are reported as mutation in PolyPhen, SIFT and other databases.
Could anyone help me to understand this concept ?
Thanks Rk
I have considered Microarray GenomeStudio Illumina v1 as input. I took CHR, POS, RSID's and GENOTYPE from the input file and converted this to VCF. During conversion, I mapped it with hg19.fa (reference genome) and this generated vcf was used in Annovar to annotate the variants (~6lkhs) of one sample.
Can you please guide saying is the analysis correct ? Here, I have REF and ALT as G>G and after annotation am getting Polyphen Sift MutationTester score for this as "D", Deleterious.
What was the actual input that gave that result? Just the one line of your VCF.
And was it that VCF you put into SIFT etc, or did you use something else as input?
No, I have given the input of all the variants (~6lkhs) as vcf to ANNOVAR. The case that I shared is for one single variant having snpid, chr, pos, ref, alt and other annotation scores as annotated from Annovar.
What @Emily is asking for is a real line from your VCF file (that shows an actual SNP). Please post that. You could even post 2 or more.
sofie_carolina : People on this forum help others freely but it does not mean you can take advantage of their generosity. It is your responsibility to provide real/accurate information so you can get usable answers.
Please don't post fake examples/incorrect data. You stand to lose more here.
Sorry for the inconvenience. I kept these things confidential as per policies of my institute. Please understand. I apologize for the same.
That's your output. Can you send the corresponding input, please?
It's also a fake example: rs123456 is not a real variant and APOH is on chr17. Could you also send us a real example, not a fake one. I can't check what you've put in against the databases unless it's actually real data.
It's Ok Mam. Thank you for your honest help. I respect your generosity. Being a researcher, I do also help to many of my colleagues and juniors. I asked this question just to clarify how bioinformaticians narrate difference between mutation and variation. Please don't misunderstand me. Thanks a lot for your help.
Without a real sample of input I can only guess at what's causing the ref/ref problem. I suspect that your input file has the reference allele listed as the alternative. Annovar takes whatever you call the alt and calculates consequences for that, which means it's comparing its reference to the allele you've inputted, which is also the reference.
What I was going to do was lookup the variant ID in a public database to see what alleles were listed. It seems likely that the variants where this has occurred have been those where the reference allele is actually the minor allele, and particularly those there the alleles have flipped between GRCh38 and GRCh37. Without seeing a real example, I cannot confirm this. Perhaps you can do this analysis yourself and confirm it.
It would have made this a lot easier for everyone if you had told us from the start that you weren't allowed to share an example.
Thanks for your kind consideration. I will perform the analysis as you have suggested. My query was just to confirm what is difference between variation and mutation. I too also have some more doubts :), If you are kind enough I can ask. One more doubt I have.
Do we need to use mutation thresholds of Polyphen, SIFT, MutationTester, CADD for Microarray data or these are only for WES/WGS/TES ?
New questions should go in new posts.