Protein coding transcripts with vep
2
1
Entering edit mode
8.1 years ago
onemoreuser ▴ 20

I'm using VEP to annotate variants in VCF files. In the analysis of the results I want to select only the protein coding transcripts. Should I look at the field "consequence" added by VEP? I don't have a biological background and I didn't find any answer when searching on the internet until now.

vep annotation variants • 3.7k views
ADD COMMENT
0
Entering edit mode

Question resolved.

ADD REPLY
0
Entering edit mode

In my output VEP txt file, I don't see a "protein-coding" category. I only see the following:

table(d$BIOTYPE)

-      CTCF_binding_site enhancer  open_chromatin_region    promoter   promoter_flanking_region  TFBS 
10527        109                       39                 42                       103              167                       34

However if I look at which of these variants are found in CCDS or APPRIS, I find the following.Kindly advise on how to then filter out protein coding variants?

table(d$BIOTYPE,d$APPRIS != "-")

                         -   A1   A2   P1   P2   P3   P4   P5
 -                        7903  150  760 1123  156  337   75   23
 CTCF_binding_site         109    0    0    0    0    0    0    0
enhancer                   39    0    0    0    0    0    0    0
open_chromatin_region      42    0    0    0    0    0    0    0
promoter                  103    0    0    0    0    0    0    0
promoter_flanking_region  167    0    0    0    0    0    0    0
TF_binding_site            34    0    0    0    0    0    0    0
ADD REPLY
3
Entering edit mode
8.1 years ago
abascalfederico ★ 1.2k

I usually run VEP with option "--everything". In the results you can see the consequence for each overlapping transcript. You have to carefully parse the results, which is a bit complicate when there are multiallelic variants. There is a field for transcript-biotype named "BIOTYPE" and another field for the transcript ("Feature"). Just set a filter for BIOTYPE to be "protein_coding".

Alternatively, you may preload a list of protein-coding transcripts (you can get them from Biomart), and see whether the transcript in the "Feature" field is within your list. BTW, I would also recommend not to use all the protein-coding transcripts but only the ones that are more reliable (e.g. with CCDS or APPRIS support).

[edited to correct a detail]

ADD COMMENT
0
Entering edit mode

Thank you. One of the requirements is to be a protein coding transcript but I also filter other features like you said.

ADD REPLY
6
Entering edit mode
8.1 years ago
Emily 23k

You can filter your output by consequence type. Are you using the online tool or the standalone script? On the online tool, there's a little filter box above your results table where you can select Biotype is protein_coding. If you're using the script, you can run the filter script with --filter "Biotype is protein_coding".

ADD COMMENT
0
Entering edit mode

Thanks Emily, as someone who is using the script this is exactly what I needed.

ADD REPLY

Login before adding your answer.

Traffic: 2717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6