Edit score column in bed file
2
0
Entering edit mode
5.0 years ago
fr ▴ 210

I have a number of chr regions that I'd like to edit in a bed file. How would you do this in bedtools? My appologies, I feel that I am just lacking the correct term to search for it and get a good answer.

For instance, I currently have a list of chromosome regions which I'd like to set as NA or 0 in a bedgraph file. The list of regions is currently in no specific format (have it in python for now).

Let's say my bedgraph file looks like this:

track name="myfile"  yLineMark="0.0" alwaysZero=on maxHeightPixels=100:75:11 visibility=full viewLimits=-1:1 autoScale=on type=bedGraph
chr1    3000000 3050000 -0.87
chr1    3050000 3100000 -0.87
chr1    3100000 3150000 -0.84
chr1    3150000 3200000 -0.86
chr1    3200000 3250000 -0.89

and for some regions, which I have processed before, I want to convert the scores to NA or 0. For instance, chr1_3050000_31000000 and chr1_3150000_3200000. The end result would be something that you could load as a track but would show those regions as null or blanks (hence my suggestion for NA or 0):

track name="myfile"  yLineMark="0.0" alwaysZero=on maxHeightPixels=100:75:11 visibility=full viewLimits=-1:1 autoScale=on type=bedGraph
chr1    3000000 3050000 -0.87
chr1    3050000 3100000 NA
chr1    3100000 3150000 -0.84
chr1    3150000 3200000 NA
chr1    3200000 3250000 -0.89

Thanks a lot in advance

bed bedtools • 2.6k views
ADD COMMENT
1
Entering edit mode

Hello rf ,

please give us an example of input bed file, the regions file and your desired output. Thus it is much clearer for us what you are trying to do.

Thanks.

fin swimmer

ADD REPLY
0
Entering edit mode

True @finswimmer, I missed that. I've made a few edits, hopefully it will be clearer.

ADD REPLY
0
Entering edit mode

Why set to NA, maybe just exclude, like set difference, something like grep -f file1 file2 ?

ADD REPLY
2
Entering edit mode
5.0 years ago
tshtatland ▴ 190

My preferred choice is to write the regions as a bed file from your code (python, etc). Then use standard bioinformatics tools, here bedtools intersect with option -loj (Left outer join. Report features in A with and without overlaps. See: https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html#loj-left-outer-join-report-features-in-a-with-and-without-overlaps. Follow up with a little bit of perl, a language commonly used for simple, quick in-line text processing (or python, if you want). Note that I stripped the header for clarity - feel free to add it if needed:

% cat a.bed
chr1    3000000 3050000 -0.87
chr1    3050000 3100000 -0.87
chr1    3100000 3150000 -0.84
chr1    3150000 3200000 -0.86
chr1    3200000 3250000 -0.89
# Your regions with NAs, dumped as bed:
% cat b.bed
chr1    3050000 3100000 NA
chr1    3150000 3200000 NA
% bedtools intersect -a a.bed -b b.bed -loj | \
    perl -F'\t' -lane '
print join "\t", @F[0,1,2],
                 ( $F[-1] eq q{.} ? $F[3] : $F[-1] ); '
chr1    3000000 3050000 -0.87
chr1    3050000 3100000 NA
chr1    3100000 3150000 -0.84
chr1    3150000 3200000 NA
chr1    3200000 3250000 -0.89
ADD COMMENT
1
Entering edit mode
5.0 years ago

The file all-regions.bg contains all regions and scores in sort-bed-sorted bedgraph format.

The file regions-you-want-to-remove.bed3 contains regions-to-be-removed in sort-bed-sorted BED3 format (just the genomic intervals, no scores). These regions should entirely match those in all-regions.bg, but lack score data.

One-liner pipeline:

$ bedops -n 100% all-regions.bg regions-you-want-to-remove.bed3 | bedops -u - <(awk -vFS="\t" -vOFS="\t" '{ print $0, "NA" }' regions-you-want-to-remove.bed3) > answer.bg

The file answer.bg is a sort-bed-sorted bedgraph file containing all regions and a mix of numerical and NA scores.

ADD COMMENT

Login before adding your answer.

Traffic: 1849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6