I am working with VCF files from the 1000 genomes project:
I downloaded the files from here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
I found that there is an option to change the REF and ALT alleles to ancestral and derived alleles respectively if you have a field specifying the ancestral allele in your vcf file: --derived
However sometimes the ancestral file is not specified (beginning of my file):
22 16050075 . A G 100 PASS AA=.||| GT 0|0 0|0
Give that the Ancestral allele here is unknown AA=.|||
using the --derived option will just keep the REF alleles as it was? Or how will it handle this specific cases?
Also, I found these lines here on the manual but I am not sure if I understand them correctly. If I want to use the --derive
command should I also use --freqs2
and --counts2
?
OUTPUT ALLELE STATISTICS
--freq
--freq2
Outputs the allele frequency for each site in a file with the suffix ".frq". The second option is used to suppress output of any information about the alleles.
--counts
--counts2
Outputs the raw allele counts for each site in a file with the suffix ".frq.count". The second option is used to suppress output of any information about the alleles.
--derived
For use with the previous four frequency and count options only. Re-orders the output file columns so that the ancestral allele appears first. This option relies on the ancestral allele being specified in the VCF file using the AA tag in the INFO field.