Hello everyone,
I extracted the variants for all my data using chromosome coordinates and now I have a very BIG text file, with the following format:
SAMPLE CHROM POS ID REF ALT QUAL FILTER GT GQ DP
XXXX chr2 165990524 rs4667859 T C 256 PASS 1/1 66 23
YYYY chr2 165993939 rs139604390 G A 155 PASS 0/1 188 33
I would like to know which would be the fast way to annotated these variants, especially to get the consequence of my variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift).
I tried to work with VEP, but I am not sure about the input format in this case. Any thoughts about this?
Thank you.
Your format seems to be extracted from a VCF, as pointed by @nicolas, you can use VEP if the data is provided in the correct format, so you can convert back it to VCF or use the VEP REST API