Hi All,
I want to explore here with the help of all friends a good way to make the buffalo genome annotation using VEP.
As you all know, the reference of Buffalo genome and annotation file it (gff3), was recently released at the chromosome level (Almost four months ago). So, the buffalo genome in the VEP database has not cache. Now we only have a genome reference and an annotation file (gff3) with a VCF file for annotation.
It is noteworthy that, previously, I tried to run VEP using Anaconda and i got the output without error. But my output file is incomplete.
Code i run:
grep -v "#" GCA_003121395.1_ASM312139v1_genomic.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna
and my output:
Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra
CM009840.1_10757615_A/G CM009840.1:10757615 G 102405271 XM_006078640.2 Transcript missense_variant 639 607 203 T/A Act/Gct - IMPACT=MODERATE;STRAND=1;SOURCE=data.gff.gz
CM009840.1_10757615_A/G CM009840.1:10757615 G 102405271 XM_025278176.1 Transcript missense_variant 639 607 203 T/A Act/Gct - IMPACT=MODERATE;STRAND=1;SOURCE=data.gff.gz
CM009840.1_10757615_A/G CM009840.1:10757615 G 102405271 XM_025278183.1 Transcript missense_variant 639 607 203 T/A Act/Gct - IMPACT=MODERATE;STRAND=1;SOURCE=data.gff.gz
But, the problem is that I did not calculate Sift for me? as a rule, should display the value of Sift in the last column?
Given the script executed and the resulting output, I want to know how to get the Sift value for each missense_variant?
Best Regard