Counting number of base pairs and features in each bed file
0
0
Entering edit mode
5.2 years ago
Ron ★ 1.2k

Hello all,

Is there any way to count the number of base pairs in each individual bed file?

I know we can do this for the intersecting bed files,but want to to do this separately .

bedtools intersectBed -a file1.bed -b file2.bed -wo

below is the output from the two bed files using intersect and counting number of base pairs from that. However i also want to count the number of total base pairs in each of those bed files too.(per interval and then can sum those up)

chr1  69028   69391   ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547   chr1    69090   70008   301
chr1  69432   69630   ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547   chr1    69090   70008   198
chr1  69677   69961   ref|OR4F5,ref|NM_001005484,ens|ENST00000335137,ccds|CCDS30547   chr1    69090   70008   284
chr1  621055  622013  ref|OR4F3,ref|OR4F29,ref|OR4F16,ref|NM_001005221,ref|NM_001005224,ref|NM_001005277,ens|ENST00000440200,ens|ENST00000332831,ccds|CCDS41221   chr1    621095  622034  918
chr1  861071  861574  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000437963,ccds|CCDS2 chr1    861321  861393  72
chr1  865582  865885  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000341065,ens|ENST00000437963,ccds|CCDS2 chr1    865534  865716  134
chr1  866331  866507  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000598827,ens|ENST00000341065,ens|ENST00000437963,ccds|CCDS2 chr1    866418  866469  51
chr1  871064  871262  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000341065,ccds|CCDS2 chr1    871151  871276  111
chr1  874294  874969  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000455979,ens|ENST00000341065,ccds|CCDS2 chr1    874419  874509  90
chr1  874294  874969  ref|SAMD11,ref|NM_152486,ens|ENST00000420190,ens|ENST00000342066,ens|ENST00000455979,ens|ENST00000341065,ccds|CCDS2 chr1    874654  874840  186
  

Thanks

Ron

RNA-Seq ngs • 3.2k views
ADD COMMENT
0
Entering edit mode

Can you give us a few sample lines of input and expected output? I don't understand what you mean by counting base pairs and features in a file that contains just contig names and coordinates.

ADD REPLY
0
Entering edit mode

Hi RamRS, I updated my question. I just want to look at the number of base pairs in each bed files.(not features, i Updated it)

ADD REPLY
2
Entering edit mode

It's a tab separated file, you can simply pass the intersect result through awk and have it do $3 - $2 +1 for the first file and $7 - $6 + 1 for the second file, printing each sum out in a new column.

ADD REPLY
0
Entering edit mode

For calculating the total base pairs in each file,shouldn't I calculate the difference in the individual files ? or the intersect result ?I did both ways and the results are different.

ADD REPLY
0
Entering edit mode

If you wish to calculate total base pairs in each file, declare 2 variables in the BEGIN block and add the $3 - $2 +1 to one variable and $7 - $6 + 1 to the other variable in each line, then print them out in the END block (or however you wish to output them)

ADD REPLY
0
Entering edit mode

A follow up question - I also want to count the percentage of UTR's ,exons in my bed file ,so the way would be downloading a bed file for both of them separately(UTR's ,exons) and doing the intersect with the complete bed file of interest?

ADD REPLY

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6