[Resolved] Extract Top score peaks from a .bed
2
0
Entering edit mode
8.5 years ago
giroudpaul ▴ 70

Hello,

I have troubles extracting the best score peaks from my bed file.

I have something like this :

chr1    4483445    4484123    MACS_peak_1    59.19
chr1    4533929    4534670    MACS_peak_2    291.66
chr1    4561472    4562345    MACS_peak_3    492.41
chr1    4661528    4662199    MACS_peak_4    72.40
chr1    4695864    4697136    MACS_peak_5    204.10
chr1    4745953    4746562    MACS_peak_6    420.59

and I would like to extract just the peaks that are above a threshold for the score (ex : 100) (5th column)

I have tried with bedtools, but it can just sort them by chromosome then score, and I don't find how to extract just the lines with a score above 100.

I succeed to extract them with unix using sort then tail, but afterward, when I write the output, and try to work with the file, I have an issue since unix don't write the 5th column as integer but as numerical.

Is there anyhow a better way to do it ?

Thank you

bed • 2.4k views
ADD COMMENT
1
Entering edit mode
8.5 years ago
giroudpaul ▴ 70

Well, this just worked, thank you for the ultra quick answer.

I wanted to restrain my bed file before putting it into GREAT, and I told me it won't accept it because 5th column was not an interger, so that's were I didn't understood.

Anyway, thank you

Edit : actually, it still don't work with GREAT...

Error details:
Line X: The fifth field (score) must be an integer (not 291.66). Input data: 'chr1 4533929 4534670 MACS_peak_2 291.66'

ADD COMMENT
1
Entering edit mode

Then you'll need to round:

awk 'BEGIN{OFS="\t"}{if($5>100) {$5 = int($5); print $0}}' input.bed > output.bed

This always rounds down, but you could do things differently if needed.

ADD REPLY
0
Entering edit mode

This work just fine, thank you !

ADD REPLY
0
Entering edit mode
8.5 years ago
awk '{if($5>100) print $0}' input.bed > output.bed

The fifth column isn't an interger, so I'm not sure why you're surprised that it's not automagically being made into one.

ADD COMMENT
0
Entering edit mode

For brevity, you can also do:

$ awk '$5 > 100' input.bed > output.bed
ADD REPLY

Login before adding your answer.

Traffic: 2024 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6