Get Snp Position From A Python Interface.
3
2
Entering edit mode
11.3 years ago
Peixe ▴ 660

Hi,

I have a Python script which takes as input a list of SNPs, then calls tabix to retrieve .VCF file and does some stuff with it. The problem is that i need to manually add (using UCSC genome browser web*) the coordinates of the SNP for tabix to work properly.

Does any one know any simple python API / function / module to easy add this information without having to perform the search manually?

Thanks!

python snp tabix coordinates • 12k views
ADD COMMENT
0
Entering edit mode

Hi Peixe, you should accept one of these excellent answers! :)

ADD REPLY
9
Entering edit mode
11.3 years ago

You could use an external database to find the coordinates of your rs## (not tested, please check the +0/+1 for the coordinates)

for R in rs25 rs26 rs27 
do
    mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Dhg19 -N -e "select concat(chrom,':',chromStart+1,'-',chromEnd) from snp135 where name='$R';" |\
    xargs tabix my.vcf.gz
done
ADD COMMENT
0
Entering edit mode

Yes, this fits perfect! Coordinates are right. Thanks, @Pierre Lindenbaum!

ADD REPLY
0
Entering edit mode
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104
open: No such file or directory
[tabix] was bgzip used to compress this file? my.vcf.gz
ADD REPLY
7
Entering edit mode
11.3 years ago
brentp 24k

You can do this with cruzdb

from cruzdb import Genome
import sys

fname = sys.argv[1]
hg19 = Genome('hg19')
snp135 = hg19.snp135

for rs in (l.split()[0].rstrip() for l in open(fname) if l.startswith("rs")):
    print snp135.filter_by(name=rs).first()

which takes a file with the rs id's in the first columns and prints out a bed file with the chromosome locations from dbsnp 135.

ADD COMMENT
0
Entering edit mode

Wow! This is amazing too! I didn't know about the existence of this module. I think i will really get much profit out of it in the future. Thanks, @brentp!

ADD REPLY
0
Entering edit mode

hi @brentp, I run into your tool and noted that the following line is giving an error.

snp135 = hg19.snp135

I realized that it's because snp135 is no longer the latest database. How do you determine which version of dbsnp to use without knowing what's currently available? PS: I tried snp141 and it worked..

ADD REPLY
0
Entering edit mode

When I try to run it, I got the below error:

Traceback (most recent call last):
  File "hellowworld.py", line 1, in <module>
    import Genome
ImportError: No module named 'Genome'

What shall I do?

ADD REPLY
0
Entering edit mode

Have you installed cruzdb first?

ADD REPLY
0
Entering edit mode

Yes I installed it and now get another error:

Traceback (most recent call last):
  File "hellowworld.py", line 1, in <module>
    from cruzdb import Genome
  File "C:\Python34\lib\site-packages\cruzdb\__init__.py", line 5, in <module>
    from . import soup
  File "C:\Python34\lib\site-packages\cruzdb\soup.py", line 1, in <module>
    from . import sqlsoup
  File "C:\Python34\lib\site-packages\cruzdb\sqlsoup.py", line 458
    except KeyError, ke:
                   ^
SyntaxError: invalid syntax
ADD REPLY
1
Entering edit mode

I guess this synthax error is because you use Python3.4 and the module was written for python2, but that's just my first guess.

ADD REPLY
0
Entering edit mode

When I run it, I got

fname = sys.argv[1]
IndexError: list index out of range

What is the problem?

ADD REPLY
0
Entering edit mode

I got

fname = sys.argv[1]
IndexError: list index out of range
ADD REPLY
0
Entering edit mode

You need to save the code as a script and use the input file as a argument when executing the script using your python interpreter.

ADD REPLY

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6