How to convert a Map file to BED using perl?
1
0
Entering edit mode
7.0 years ago
lexauex • 0

Hello.

I am new to bioinformatics and this is an elective class. My lab project is to convert MAP to BED. I am using text wrangler to edit the script and I have a MAC.

This is what I have so far:

#!./perl
use strict;
use warnings;
my $input1 = shift @ARGV;
my @output = $input1;
print "$input1= $output\n";

Thank you!

perl • 1.7k views
ADD COMMENT
1
Entering edit mode

You are going to need to do a bit more effort on your own than that to get help with your code. Thank you for clarifying that this is an assignment/project.

ADD REPLY
0
Entering edit mode

If I knew more, trust me I would have more. Like I stated before, I am new to all of this.

Thank you for taking the time to respond.

ADD REPLY
2
Entering edit mode
7.0 years ago

Could you please show example input and example of desired output? plink can convert between map/ped and bed/bim/fam if this is what you are looking for with recode option. Is this what you are looking for to reimplement in your project?

ADD COMMENT
0
Entering edit mode

Here is the Input:

1   rs2905037-G-A   0   775426
1   rs6701114-C-T   0   1022037
2   SNP1-G-T    0   2
10  rs7918734-C-A   0   192533
10  rs1476129-A-C   0   265520
10  rs1476130-T-A   0   274675
10  rs17221239-C-G  0   285113

Output:

chr1    775425  775426  rs2905037-G-A   20
chr1    1022036 1022037 rs6701114-C-T   20
chr2    1   2   SNP1-G-T    20
chr10   192532  192533  rs7918734-C-A   20
chr10   265519  265520  rs1476129-A-C   20
chr10   274674  274675  rs1476130-T-A   20
chr10   285112  285113  rs17221239-C-G  20
chrX    816303  816304  rs17537524-C-A  20
chrX    2335502 2335503 rs17461767-G-A  20
chrY    47  48  SNP2-T-C    20
23  rs17537524-C-A  0   816304
23  rs17461767-G-A  0   2335503
24  SNP2-T-C    0   48

I am currently using text wrangler trying to create a script. Our professor told us to convert the MAP to BED, which I know that I need to use INPUT/OUTPUT codes. I had a shift in place and my professor said that is correct. Do I need to grep any information out? I just really need a kick start. I understand this stuff when given an instructed assignment, but this is free range so no too much information was provided.

Thank you for your advice!

ADD REPLY
1
Entering edit mode

Thank you, genomax2, for reformatics the lexauex's reply.

This part:

23  rs17537524-C-A  0   816304
23  rs17461767-G-A  0   2335503
24  SNP2-T-C    0   48

does not look like example of proper desired output. Assuming these 3 lines are just a mistake you can use this awk one-liner to the output from input:

awk '{print "chr"$1, $4-1, $4, $2, "20"}' input > output

also I do not understand what "20" means in your output in the fifth column. Please clarify especialy if it does not have to be equal to 20 always for your inputs.

ADD REPLY
1
Entering edit mode

It is possible that something may have been lost when @lexauex copy/pasted the sample in. @lexauex: You can edit the post above to change. I agree with @Petr that the last part seems to have some error in it. Assuming this part is correct

chr1    775425  775426  rs2905037-G-A   20
chr1    1022036 1022037 rs6701114-C-T   20
chr2    1   2   SNP1-G-T    20
chr10   192532  192533  rs7918734-C-A   20
chr10   265519  265520  rs1476129-A-C   20
chr10   274674  274675  rs1476130-T-A   20
chr10   285112  285113  rs17221239-C-G  20
chrX    816303  816304  rs17537524-C-A  20
chrX    2335502 2335503 rs17461767-G-A  20
chrY    47  48  SNP2-T-C    20

@Petr's solution will do what you want. It provides a ready to use solution that does not use Perl.

If you are required to do this exercise using perl then see if this pesudocode helps.

  • Open the data file, grab the name before .map using split.
  • Read the file in line by line. chomp to remove return.
  • Split the line in constituent records (on tab or what ever separates the fields)
  • Print the reformatted record out (join plus combination of fields from last step) to filename.ped
  • Close files
ADD REPLY
0
Entering edit mode

The output was created by my professor. When I ran the file in the terminal it said there was an error as well. I brought this issue to him and he ensures that there is no mistake. Can I ask why do you believe the end part is an error? I am familiar with split. This has helped me greatly.

Thank you!

ADD REPLY
0
Entering edit mode

I feel that the prefix chr is missing from those rows and plus they do not seem to follow the convention of other rows where

chrN start stop rsID-N-N 20 seems to be general format.

ADD REPLY

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6