Compare two cols of one file to another file of same cols and fetch the matches
0
0
Entering edit mode
5.0 years ago

I have file1 of 6.5lakh rows with two cols, "Chr" and "Pos". I want to compare this file with dbsnp (file2) datadump and match with with Chr and Pos col present in dbSNP dump. Once matched, respective rsid's to be fetched. I tried using Python Panda's but my process is getting killed. When it tried for 50000 rows it worked.

How can I fetch rsid for whole dataset (file1 = 6.5lakh rows) from dbSNP.

#Program to compare Chr and Pos of a sample with dBSNP and fetching RSIDs
import pandas as pd
df1 = pd.read_csv("v2_infi_chr_pos.csv",sep='\t',dtype='unicode')
df2 = pd.read_csv("dbsnp150_header.txt",sep='\t',dtype='unicode')
df3 = pd.merge(df1, df2, on='Chr''Pos', how='inner')
export_csv = df3.to_csv (r'rsids_infiniumv2_hg38.txt', index = None, header=True)
next-gen SNP genome gene python • 771 views
ADD COMMENT

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6