Biostar Beta. Not for public use.
How to change SNP identifier & position to chr:start:end
1
Entering edit mode
16 months ago
OAJn8634 • 50

I have a Plink file that contains chr, rs and position for 30K variants. I would like to find a way that would convert this information into chr, start and end. Is there an easy way of achieving this? I will be grateful for any advice.

ADD COMMENTlink
0
Entering edit mode

Hello,

could you please give some examples of your input? Should the desired output a bed file? I'm asking because one have to consider the 0-based vs 1-based interval problematic.

fin swimmer

ADD REPLYlink
0
Entering edit mode

Hello. Thank you for your response. My ultimate aim is to create a .txt file that will contain chr, start and end data for my 30K variants. I will then use this .txt file for other analysis. So I really do not mind the format for the output for a long as I can read it in R. My current bim file looks like this:

Chr         rs                  Pos   Base-pair coordinate  A1 A2 
23          rs34557243  24.7104       60425                 C  A
23          rs28419004  24.7103       60692                 T  C
23          rs28705946  230.9480      60882                 T  G

Please let me know if this is helpful. Thank you

ADD REPLYlink
1
Entering edit mode
4 months ago
Germany

You can use awk to extract the columns with the chromosome name and the position to create a valid bed file:

$ awk -v FS="\t" -v OFS="\t" 'NR>1 {print $1, $4-1, $4, $2}' input.bim > output.bed

The coordinates in the bim files are 1-based. But bed uses 0-based coordinates. That's why we have to subtract 1 ($4-1) from the given position for the start coordinate.

This will create:

23  60424   60425   rs34557243
23  60691   60692   rs28419004
23  60881   60882   rs28705946

fin swimmer

ADD COMMENTlink
0
Entering edit mode

This is perfect. Thank you very much

ADD REPLYlink
0
Entering edit mode
4 months ago
zx8754 7.5k
London

If we are going to read it into R, why create intermediate files? Just do it within R:

library(data.table)

fread("myBim.txt", skip = 1)[, list(V1, V4, V4)]
#    V1    V4    V4
# 1: 23 60425 60425
# 2: 23 60692 60692
# 3: 23 60882 60882
ADD COMMENTlink
0
Entering edit mode

Thank you so much! This is so helpful!

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1