How To Convert A Basic Bed File With Only 3 Columns (Chrname, Start, End Site) Into A Bigger Bed With 6 Columns
4
0
Entering edit mode
12.4 years ago
Hamilton ▴ 290

Hi,

My bed file has only 3 columns - chr name, start, and end. But for macs in galaxy, it requires a bed file with 6 columns. How can I convert?

chip-seq macs bed • 8.5k views
ADD COMMENT
0
Entering edit mode

The question is not clear. Can you give examples input and output, and what additional columns you want to add?

ADD REPLY
4
Entering edit mode
12.4 years ago
Gjain 5.8k

well if you have no extra information, then you can add name, score, strand column which are basically your column4, column5 and column6 by adding .(dot) for column4, 0(zero) for column5 and +(strand) for column6.

If you can give an example of what kind of data is available to you, I can modify my answer to have the correct name, score and strand.

ADD COMMENT
3
Entering edit mode
12.4 years ago
Wolf ▴ 130

MACS uses strand information (which would be in column 6) for the fragment size model it builds. If you want to use MACS and you expect your peaks to be narrow (i.e. you would want to use the model building step), I think you should try to get the strand information from whatever aligner you used into your bed file. Without knowing more, I can't help you with how to do that.

If you don't have the strand information (i.e. if you made them all + strand), you have to use the --nomodel option.

ADD COMMENT
0
Entering edit mode

i have got this bed file from author of wang et al 2011 pnas paper as it is publicly available. initially, it has only 3 columns. what if i add up . for col4 , 0 for col5, + for col6 assuming that i dont have any extra information for that but i only know basic information as Gjain suggested?? this can give any bias result??

ADD REPLY
0
Entering edit mode

it means that you can't ask MACS to estimate ChIP fragment size from the data (i.e. use --no-model). Usually you would use the fragment size to shift/extend plus strand reads to the right and minus strand reads to the left, so that the cover the actual binding site that was pulled down. You won't be able to do this, so you should set the shift size to 0. That will reduce your resolution somewhat, but depending on what you are planning on doing, it might still be ok.

ADD REPLY
2
Entering edit mode
8.5 years ago
Fidel ★ 2.0k

I think this should work

cat bed3.bed | perl -lane 'print "$F[0]\t$F[1]\t$F[2]\t.\t0\t."' > bed6.bed
ADD COMMENT
0
Entering edit mode
8.5 years ago

See if this works:

awk -v OFS='\t' '{print $1,$2,$3,".",".","."}' bed3.txt > bed6.txt

If last three columns are to be empty, instead of ".", this may work:

awk -v OFS='\t' '{print $1,$2,$3,"","","",""}' bed3.txt  > bed6.txt
ADD COMMENT
0
Entering edit mode

In theory, the 5th column should be an score from 0 to 1000 (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1), that's why 0 is better than '.'

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6