Hi, I am new this type of analysis but I am really interested in learning.
I did a DNA genotyping using 23andMe and I want to play a little bit with my data. My final goal would be to compare my personal data to population dataset, so I am taking it step by step.
So my question(s) are:
1) do I need to phase my data to perform what I want afterwards ?
2) if so how can I do this ? Using SHAPEIT or else ?
Thanks in advance to those taking the time to answer me.
Louis
Thanks, I will take a look at that !
Hi again,
How do you convert your raw 23andMe data to vcf ? Did you remove duplicates ?
What are the differences between the vcf.gz file you use in your scripts (ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.vcf) and the files used in Alicia Martin's code (the same used in Kevin Blighe tutorial) ?
Thanks
plink can accept the 23andMe file format (and there are probably other ways that others can perform the conversion). However, I essentially wrote some custom code (and excluded indels, where I didn't know the REF and VAR sequences).
So, I am not actually saying you should directly use my code. However, you should take time to understand all the steps to conversion, and I am OK with you getting some pointers from the code (although I would appreciate an acknowledgement, if you do that). From my end, I also realize that there is a limitation in what support I can provide, and therefore how much credit I can/should receive.
As for your question about which 1000 Genomes sample was used, I selected the Omni array since the uncompressed version was much smaller than an Illumina-sequencing-based multi-sample .vcf.
I don't think Alicia added any new samples, so she didn't have to worry about combining files in different formats (which is unfortunately directly related to your question - however, the combination of code may help with creating a combined file that is compatible with downstream analysis).