Converting genotype file into bed format
1
0
Entering edit mode
7.0 years ago
yasin.delco ▴ 20

Hi,

I have a problem. I have genotyping data for hundreds of thousands of snps and thousands of individuals. The coordinates are in hg17 and I need to convert them into hg19. Liftover is fine for this, but it works with bed files. So I need to convert my files into bed format, carry to hg19, then convert bed to genotyping format again.

If it was for a list of SNPs of several thousands, I can write a perl script for that. But for a dataset of this size, it will take too long because each line consists thousands of fields, so parsing would be slow. Is there a practical tool for this?

The file format is as follows:

identifier hg17_coordinate allele1 allele2 subject1_a1a1 subject2_a1a2 subject1_a2a2 subject2_a1a1 ...
rs1111 2564468 A G 1 0 0 0 1 0 ...

Thanks

genome SNP • 1.9k views
ADD COMMENT
0
Entering edit mode
7.0 years ago

if you have chromosomes in your data set then replace "CHROM" with proper column number in

awk '{print "CHROM", $2-1, $2, $1, $3, $4}' input > output

This should be quick. Then use liftover for bed, convert back to original format and replace first 4 columns (but where is the chromosome id in your data???

if there is no info about chromosomes, you might be able to get bed files by searching dbSNP for proper rsid in here ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/BED/ also I am nit sure if very old rsid will be in there.

ADD COMMENT

Login before adding your answer.

Traffic: 3058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6