SNPs to rs format
2
0
Entering edit mode
6.0 years ago
hosin • 0

I have data of ovine high-density 600K snp array ,How I can change names of SNPs to rs format ? Hi thank you for your attention This is an example of the data format . I should change only first column (names to rs format)

"Name"                      "Chr" "Position" "WDS2.GType" "WDS2.Log R Ratio" "WDS2.B AlleleFreq"
250506CS3900140500001_312.1   23   26298017   AA          0.1586    0.0009
250506CS3900218700001_1294.1  2    148802744  AB          0.1641    0.4478
250506CS3900283200001_442.1   1    188498238  AB          0.0914    0.4470
CL635241_413.1                3    182202867  BB          0.0718    1.0000
CL635750_128.1                3    223741135  AB          0.0265    0.4830
CL635944_160.1                6    114778683  BB          -0.3251   0.9972
Contig35697_5761.1            6    18835475   BB          -0.1425   0.9970
CZ924211_329.1                2    106470676  AB          -0.1050   0.49
OAR3_207847578_X.1            3    193150848  AB          -0.0122   0.4828
OAR3_211362922_X.1            3    196313951  AA          0.1171    0.0039
oar3_OAR1_119450287           1    119450287  AA          -0.1919   0.0176
oar3_OAR1_119458232           1    119458232  AA          -0.1807   0.00
OAR6_114756902_X.1            6    104240372  BB          0.0654    1.0000
OARUn.515_157736.1            15   48287818   BB          -0.0536   0.9887
OARUn.54_510669.1             16   70664506   AA          -0.4320   0.0129
OARUn.992_104641.1            18   12223461   BB          0.1197    0.9962
OARUn.992_53657_X.1           18   12171867   AA          0.0340    0.0066
s00002.1                      0    0          AB          -0.0878   0.4830
s00007.1                      20   32340549   BB          0.0397    0.9956
s00014.1                      6    48965903   BB          -0.0218   1.000
SNP • 1.8k views
ADD COMMENT
1
Entering edit mode

Please add an example of the data format you have.

ADD REPLY
0
Entering edit mode

Hello hosein_salehi6!

I'm closing this until you restructure the post, hosein_salehi6

ADD REPLY
0
Entering edit mode

I've now opened it. Please also use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLY
0
Entering edit mode
6.0 years ago
Russ ▴ 500

Hi hosein_salehi6

It can be tricky dealing with non-reference species. I quickly wrote a script in python that uses the chromosome and position in the text you posted and queries a VCF file with ovine SNPs (sourced from dbSNP 150 apparently). I assumed your results are tab delimited. I accessed the vcf from ensembl:

http://useast.ensembl.org/info/data/ftp/index.html

The script is not quick because it goes through all 6 million variants in the VCF file, but it worked on the sample you provided. Also, some of the variants will not be in the vcf file (like the variant on chromosome 0 and position 0 in the text you posted). For these SNPs the result will print out with "not in vcf file".

Note that you'll want to make sure the genomes used for your snpchip and this VCF file are the same, otherwise your results will be incorrect.

To run the script, copy and paste the code below into a text editor and save it as "sheep_snps.py". Then execute python sheep_snps.py <query_file> <vcf_file> > <output_file>

#!/usr/bin/python

from __future__ import print_function
import sys

snp_dict = {}

with open(sys.argv[1], 'r') as ovine:
    for line in ovine:
        line = line.rstrip()
        if "\"" in line:
            continue
        else:
            query_chrom = line.split("\t")[1]
            query_pos = line.split("\t")[2]
            identifier = query_chrom + "_" + query_pos
            snp_dict[identifier] = line


vcf_dict = {}

with open(sys.argv[2], 'r') as vcf:
    for line in vcf:
        line = line.rstrip()
        if "#" in line:
            continue
        else:
            ref_chrom = line.split("\t")[0]
            ref_pos = line.split("\t")[1]
            ref_identifier = ref_chrom + "_" + ref_pos
            vcf_dict[ref_identifier] = line.split("\t")[2]

for k,v in snp_dict.iteritems():
    try:
        print(vcf_dict[k],v, sep = "\t")
    except:
        print("not_in_vcf_file",v, sep = "\t")

and the results look like:

rs407034953      Contig35697_5761.1            6   18835475   BB  -0.1425  0.9970
rs423741346      OARUn.992_104641.1            18  12223461   BB  0.1197   0.9962
rs55630697       250506CS3900283200001_442.1   1   188498238  AB  0.0914   0.4470
rs420791893      CL635750_128.1                3   223741135  AB  0.0265   0.4830
rs404585538      s00014.1                      6   48965903   BB  -0.0218  1.000
rs428970013      oar3_OAR1_119458232           1   119458232  AA  -0.1807  0.00
rs399786672      CL635944_160.1                6   114778683  BB  -0.3251  0.9972
rs413235350      OAR3_207847578_X.1            3   193150848  AB  -0.0122  0.4828
rs420600628      OAR6_114756902_X.1            6   104240372  BB  0.0654   1.0000
rs400403799      OARUn.515_157736.1            15  48287818   BB  -0.0536  0.9887
rs411856398      OARUn.54_510669.1             16  70664506   AA  -0.4320  0.0129
rs414154811      s00007.1                      20  32340549   BB  0.0397   0.9956
rs429408796      oar3_OAR1_119450287           1   119450287  AA  -0.1919  0.0176
rs413244887      CZ924211_329.1                2   106470676  AB  -0.1050  0.49
rs55630642       250506CS3900140500001_312.1   23  26298017   AA  0.1586   0.0009
rs55630663       250506CS3900218700001_1294.1  2   148802744  AB  0.1641   0.4478
rs409664674      CL635241_413.1                3   182202867  BB  0.0718   1.0000
rs402411363      OARUn.992_53657_X.1           18  12171867   AA  0.0340   0.0066
not_in_vcf_file  s00002.1                      0   0          AB  -0.0878  0.4830
rs401152114      OAR3_211362922_X.1            3   196313951  AA  0.1171   0.0039
ADD COMMENT
1
Entering edit mode

You can use column -t to display text better :-)

ADD REPLY
0
Entering edit mode

Nice tip, thanks Ram.

ADD REPLY
0
Entering edit mode

Thank you for your attention .How I can to receive VCF or rs file for my data ?

ADD REPLY
0
Entering edit mode

In the link I provided above to Ensembl. You will see there is a big table near the bottom of the page. You can filter the results for "sheep" by using the box at the top right of the table. The VCF file is one of the options available there.

ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6