Entering edit mode
7.2 years ago
Joe Ashmore
•
0
I am trying to look through the 1000 Genomes VCF data to find genes with >2 Copies of a gene (>2 CNV).
I have downloaded the most recent VCF for my chromosome of interest (chr4) from: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
When I search the VCF using:
gunzip -c ALL.chr4.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz | grep CN3 | cut -f 1-5
##ALT=<ID=CN3,Description="Copy number allele: 3 copies">
##ALT=<ID=CN30,Description="Copy number allele: 30 copies">
##ALT=<ID=CN31,Description="Copy number allele: 31 copies">
##ALT=<ID=CN32,Description="Copy number allele: 32 copies">
##ALT=<ID=CN33,Description="Copy number allele: 33 copies">
##ALT=<ID=CN34,Description="Copy number allele: 34 copies">
##ALT=<ID=CN35,Description="Copy number allele: 35 copies">
##ALT=<ID=CN36,Description="Copy number allele: 36 copies">
##ALT=<ID=CN37,Description="Copy number allele: 37 copies">
##ALT=<ID=CN38,Description="Copy number allele: 38 copies">
##ALT=<ID=CN39,Description="Copy number allele: 39 copies">
4 3467434 esv3599431;esv3599432 T <CN2>,<CN3>
4 8965379 esv3599554;esv3599555;esv3599556 C <CN0>,<CN2>,<CN3>
4 9104669 esv3599560;esv3599561;esv3599562 G <CN0>,<CN2>,<CN3>
4 9126509 esv3599563;esv3599564;esv3599565 C <CN0>,<CN2>,<CN3>
4 9370866 esv3599568;esv3599569;esv3599570 G <CN0>,<CN2>,<CN3>
4 9418201 esv3599572;esv3599573;esv3599574 G <CN0>,<CN2>,<CN3>
I have also tried searching using grep CNV
with similar results.
##ALT=<ID=CNV,Description="Copy Number Polymorphism">
4 67914 esv3599345;esv3599346 A <CN0>,<CN2>
4 138870 esv3599353;esv3599354 C <CN0>,<CN2>
How do I find if an individual has more than one copy of a gene/region? I assume the best way is to narrow the region based on the gene of interest, but I didn't want to lose relevant info.
If there is a way to do this in R, it would be even better.
Thanks!