Reconstruct bed file from tab
1
0
Entering edit mode
5.8 years ago
dzisis1986 ▴ 70

I have afile like that :

chr pos  reads  
chr1    3004104 0
chr1    3005819 0
chr1    3008315 0
chr1    3008893 45  
chr1    3009812 0   
chr1    3012422 0
chr1    3015794 0   
chr1    3016183 21  
chr1    3024019 0   
chr1    3025279 0

And i am trying to create a new .bed file where i will have start and end position and reads like this :

 chr1   3008893 3009812    33
 chr1   3016183 3024019    21
.....       ............

The problem is that in R when i am using rbind i cant take chr1 etc as strings and it takes it as numbers so the final result is not with correct chromosomes. Here is my r code :

df[1:10,]
newdf <- data.frame(Date=as.Date(character()),
                 File=character(), 
                 User=character(), 
                 stringsAsFactors=FALSE) 
for(i in 1:(nrow(df)-1)){
  #print(dfs[i,2])
  if(df[i,3]>0){
    newdf<-rbind(newdf,c(df[i,1],df[i,2],df[i+1,2],df[i,3]))
    #check the case that there are reads at the end of chr and next position is 0 new chromosome 
    if(df[i+1,1]!=df[i,1]){
      newdf[length(newdf[,1]),3]= newdf[length(newdf[,1]),2]+1
    }
    }
}

I check a previous answer with data.frame but it doesnt work. Is any other way to have correct result ? Is it easier in python ?

Thank you in advance

R chr script python • 1.2k views
ADD COMMENT
0
Entering edit mode

Based on what criteria? Where is the interval information coming from?

ADD REPLY
0
Entering edit mode

If there is a number in reads then we create a fragment with start position the value of the pos where the read is and end position the next position. In case of existing reads at the last position of each chromosome we create a fragments with the pos as start and for end the pos +1. The algorithm works fine the problem is the chr1 chr2 chr3 etc order and the fact that this script transforms it to numbers. at the end instead of chr1, chr2 ...chrX, chrY i get 1,2......21,22 .

ADD REPLY
1
Entering edit mode
5.8 years ago
ATpoint 82k

Without any looping:

## Reconstruct bed file from tab 

## Read file:
in.file <- read.table("~/Desktop/test.txt", sep="\t", header = T)

## Chromosome:
chr <- as.character(in.file[(which(in.file$reads != 0)),1])

## Start coordinate of rows where reads != 0
start.coord <- in.file[(which(in.file$reads != 0)),2]

## End coordinate is the start coordinate of rows directly after those rows where reads is != 0
end.coord <- in.file[(which(in.file$reads != 0) + 1),2]

## reads
reads <- in.file[(which(in.file$reads != 0)),3]

new.df <- data.frame(chr, start.coord, end.coord, reads)


> new.df
  chr start.coord end.coord reads
1 chr1     3008893   3009812    45
2 chr1     3016183   3024019    21

Try to play around with these kind of problems to improve yourself, and always keep in mind that in R, you typically can solve these kind of things without any lapply or looping operations, simply by calling commands on a whole array, df or matrix.

EDIT: One thing that you have to keep in mind is that BED is 0-based, that means the start coordinate of e.g. a chromosome is 0, not 1. Depending on which system your input file is, yu may need to subtract 1 from the start coordinate before you write the file as BED.

ADD COMMENT

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6