How to add protein ID transcript ("NP_")?
1
0
Entering edit mode
8.0 years ago
agata88 ▴ 870

Hi all,

I would like to add "NP_" number to snp variants. I am using SnpEff and SnpSift.

I was using command below to annotate variants to hg19:

java -jar snpEff.jar -v -canon hg19 test_2.vcf > test_3.vcf

Output vcf includes NM number (transcript ID) but no protein number (NP).

Any idea how to add this?

Best,

Agata

next-gen annotation • 2.6k views
ADD COMMENT
0
Entering edit mode

Thanks, but I was wondering how to do that using SnpEff or SnpSift ...

ADD REPLY
0
Entering edit mode

Ahh, wait for someone more experienced with that then :)

ADD REPLY
2
Entering edit mode
8.0 years ago

first check snpeff is not able to print the NP_*

the ugly way: extract the NMs from your VCF, get the NP_ with NCBI efetch and create a sed file appending the NP to the NM...

 curl -Ls "https://raw.githubusercontent.com/arraytools/vc-annotation/master/snpeff/tmp/nonsyn_splicing.vcf" | grep -v "##" |\
cut -f 8 | tr ";" "\n" | grep "^ANN=" | tr "|" "\n" | grep "^NM_*" | while read S;\
do
       curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${S}&rettype=gbc" |\
       xmllint --xpath '//INSDQualifier_name[.="protein_id"]/../INSDQualifier_value/text()' - |\
       awk -v NM=${S}  '{printf("s/|%s|/|%s|%s|/g\n",NM,NM,$1);}'  ; 
done > out.sed

then use this sed file to append the NP to the NM:

$ curl -Ls "https://raw.githubusercontent.com/arraytools/vc-annotation/master/snpeff/tmp/nonsyn_splicing.vcf" | sed -f out.sed | grep NP_ | head


1   1014228 COSM3751464 G   A   11.4963 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=ISG15;ANN=A|missense_variant|MODERATE|ISG15|ISG15|transcript|NM_005101.3|NP_005092.1|protein_coding|2/2|c.248G>A|p.Ser83Asn|355/666|248/498|83/165||   GT:PL   0/1:39,3,0
1   21850165    .   A   T   4.88476 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;ANN=T|missense_variant|MODERATE|HSPG2|HSPG2|transcript|NM_001291860.1|NP_001278789.1|protein_coding|57/97|c.7325T>A|p.Ile2442Asn|7405/14343|7325/13179|2442/4392||  GT:PL   0/1:31,3,0
1   27793570    .   G   A   5.58414 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;ANN=A|missense_variant|MODERATE|STX12|STX12|transcript|NM_177424.2|NP_803173.1|protein_coding|3/9|c.226G>A|p.Glu76Lys|351/3079|226/831|76/276|| GT:PL   0/1:32,3,0
1   114720652   COSM4218186 G   A   13.3811 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=CSDE1;ANN=A|stop_gained|HIGH|CSDE1|CSDE1|transcript|NM_001242891.1|NP_001229820.1|protein_coding|18/21|c.2077C>T|p.Gln693*|2599/4313|2077/2535|693/844||;LOF=(CSDE1|CSDE1|1|1.00);NMD=(CSDE1|CSDE1|1|1.00) GT:PL   0/1:41,3,0
1   153986128   .   A   G   10.9943 PASS    DP=2;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,0,1;MQ=50;ANN=G|missense_variant|MODERATE|RAB13|RAB13|transcript|NM_002870.3|NP_002861.1|protein_coding|1/8|c.109T>C|p.Tyr37His|250/1235|109/612|37/203||   GT:PL   1/1:38,3,0
1   173485410   COSM5378456 C   T   6.32957 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=PRDX6;ANN=T|missense_variant|MODERATE|PRDX6|PRDX6|transcript|NM_004905.2|NP_004896.1|protein_coding|3/5|c.302C>T|p.Pro101Leu|353/1670|302/675|101/224||    GT:PL   0/1:33,3,0
1   183116645   .   A   T   7.93884 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,0,1;MQ=50;ANN=T|missense_variant|MODERATE|LAMC1|LAMC1|transcript|NM_002293.3|NP_002284.3|protein_coding|7/28|c.1397A>T|p.Lys466Ile|1654/7889|1397/4830|466/1609|| GT:PL   0/1:35,3,0
1   228097438   .   T   C   7.93884 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,0,1;MQ=50;ANN=C|missense_variant|MODERATE|ARF1|ARF1|transcript|NM_001024226.1|NP_001019397.1|protein_coding|3/5|c.245T>C|p.Phe82Ser|473/1973|245/546|82/181|| GT:PL   0/1:35,3,0
1   235246483   .   G   C   6.32957 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;ANN=C|missense_variant|MODERATE|ARID4B|ARID4B|transcript|NM_001206794.1|NP_001193723.1|protein_coding|7/24|c.383C>G|p.Pro128Arg|760/5946|383/3939|128/1312||    GT:PL   0/1:33,3,0
10  73913343    COSM4144861 T   C   9.6729  PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=PLAU;ANN=C|missense_variant|MODERATE|PLAU|PLAU|transcript|NM_002658.3|NP_002649.1|protein_coding|6/11|c.422T>C|p.Leu141Pro|568/2377|422/1296|141/431|| GT:PL   0/1:37,3,0
ADD COMMENT

Login before adding your answer.

Traffic: 2060 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6