Find the highest value in an interval defined in an other file
2
0
Entering edit mode
7.0 years ago
maude • 0

Hi!

I have these two dataset: Before that contains 5 columns (chromsome, start, end, line number, score)

chrI           1          10    1      0
chrI          11          20    2      0
chrI          21          30    3      0
chrI          31          40    4      0
chrI          41          50    5      0
chrI          51          60    6      0
chrI          61          70    7      0
chrI          71          80    8      0
chrI          81          90    9      0
chrI          91         100    10     0

Peaks that contains 4 columns (chromsome, start, end, value)

"chrI"  880     1091    383
"chrI"  1350    1601    302
"chrI"  1680    1921    241
"chrI"  2220    2561    322
"chrI"  2750    2761    18
"chrI"  3100    3481    420
"chrI"  3660    4211    793
"chrI"  4480    4491    20
"chrI"  4710    4871    195
"chrI"  5010    5261    238

For each lines of Peaks I would like to extract the corresponding lines (e.g all the lines between 880 and 1091 for the first line) in Before, find the highest score value and write it on a new file.

To this end, I've written this function:

summit <- function(x,y,output){
    y<- Before
    chrom <- x[1]
    start <-x[2]
    end <-x[3]
    startLine <- y[which((y$V1 == chrom) & (y$V2==start)),]
    endLine <- y[which((y$V1 == chrom) & (y$V3==end)),]
    Subset <- y[which((y$V2 >= startLine$V2) & (y$V3 <= endLine$V2))]
    maximum <- Subset[which(Subset$V4 == max(Subset$V4))]
    output <- print(maximum)
}

apply(Peaks,1,summit,output = 'peaks_list.bed')

I don't have an error message but It runs during the entire night without giving me results so I guess something is wrong with my code but I don't know what. Does anyone have any idea?

Thank you, Maude

peaks R • 1.8k views
ADD COMMENT
1
Entering edit mode

You may like to check GenomicRanges package, which simplifies the job considerably while working with genomic intervals.

ADD REPLY
0
Entering edit mode
6.9 years ago

If you're not tied to R, or you can use system, you could use BEDOPS bedmap --max-element or bedmap --max, depending on whether you want the maximum-scoring element or just the score, respectively:

$ bedmap --echo --max-element --delim '\t' Before.bed Peaks.bed > answer.bed

You could use write.table to export a BED file from R. Use BEDOPS sort-bed to prepare the BED files before using them with BEDOPS tools. And make sure the chromosome names line up by removing any quotation marks from the names.

ADD COMMENT
0
Entering edit mode
6.9 years ago

For this purpose you can use bedtools groupby and -o max option to get the maximum value.

ADD COMMENT

Login before adding your answer.

Traffic: 2456 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6