Snps In Protein Domains
3
0
Entering edit mode
10.1 years ago
Dataminer ★ 2.8k

Hi!

I have a list of genomic coordinates along with the mutations that have occurred. Something like this:

chromosome    13    27470642    C    G
chromosome    13    27890643    C    A
chromosome    13    27490642    -    C
chromosome    13    27360641    C    AG

I would like to know, on which protein domains these are present ?

Does anyone know , how this can be queried?

Thank you

snp genomics • 2.8k views
ADD COMMENT
2
Entering edit mode
10.1 years ago
Emily 23k

Have you looked at the Variant Effect Predictor?

ADD COMMENT
0
Entering edit mode

cool, I didn't know the "- -domains" option. :-)

ADD REPLY
2
Entering edit mode
10.1 years ago

I wrote a tool called MapUniprotFeatures

It loads the UCSC knownGenes database, parses a XML definition of uniprot and produce a BED file of all domains.

You can then get the intersection of your ~VCF with this BED file with bedtools:

$ java  -jar dist/mapuniprot.jar \
    REF=/path/to/human_g1k_v37.fasta \
    UNIPROT=/path/uri/uniprot.org/uniprot_sprot.xml.gz  \
    kgUri=<(curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" | gunzip -c | awk -F '        ' '{if($2 ~ ".*_.*") next; OFS="       "; gsub(/chr/,"",$2);print;}'   ) |\
    LC_ALL=C sort -t '  ' -k1,1 -k2,2n -k3,3n  | uniq | head


1   69090   69144   topological_domain  1000    +   69090   69144   255,0,0 1   54  0
1   69144   69216   transmembrane_region    1000    +   69144   69216   255,0,0 1   72  0
1   69216   69240   topological_domain  1000    +   69216   69240   255,0,0 1   24  0
1   69240   69306   transmembrane_region    1000    +   69240   69306   255,0,0 1   66  0
1   69306   69369   topological_domain  1000    +   69306   69369   255,0,0 1   63  0
1   69357   69636   disulfide_bond  1000    +   69357   69636   255,0,0 1   279 0
1   69369   69429   transmembrane_region    1000    +   69369   69429   255,0,0 1   60  0
1   69429   69486   topological_domain  1000    +   69429   69486   255,0,0 1   57  0
1   69486   69543   transmembrane_region    1000    +   69486   69543   255,0,0 1   57  0
1   69543   69654   topological_domain  1000    +   69543   69654   255,0,0 1   111 0
ADD COMMENT
0
Entering edit mode

from where I can get this uniprot_sprot.xml.gz? Thank you

ADD REPLY
0
Entering edit mode

Hi!

I am trying to download the file from github and it is throwing up tis error:

git clone git://github.com/lindenb/jvarkit/wiki/MapUniProtFeatures

Cloning into 'MapUniProtFeatures'...
fatal: remote error:
  lindenb/jvarkit/wiki/MapUniProtFeatures is not a valid repository name
ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
10.0 years ago
Pablo ★ 1.9k

You can just use SnpEff's NextProt option

http://snpeff.sourceforge.net/SnpEff_manual.html#NextProt

ADD COMMENT

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6