Reducing Number of Data Points for Circos Histogram
0
0
Entering edit mode
5.1 years ago

Hello,

I am trying to make a coverage histogram using ClicO (Circos browser interface), however, I have too many lines. My data looks is in the form of a .txt, and was produced using bedtools genomecov function. It looks like this.

NC_009636.1 0   5   0

NC_009636.1 5   25  40

NC_009636.1 25  26  30

NC_009636.1 26  35  0

NC_009636.1 35  36  10

NC_009636.1 36  37  230

NC_009636.1 37  39  240

NC_009636.1 39  40  250

NC_009636.1 40  41  260
...

With a column for the chromosome data, start and stop coordinates, and coverage

I have over 300,000 lines, which I would like to bring down to below 25,000, preferable by increasing the bin width, which at the moment, are the start stop coordinates, some of them a single nucleotide long. I am trying to think of a way to do this in bash or R, and perhaps have bins be 100kb or so.

Best,

J

rna-seq alignment circos bash • 708 views
ADD COMMENT
0
Entering edit mode

Please use the format bar and especially the code option (10101) to highlight code and data examples. I did it for you this time.

ADD REPLY

Login before adding your answer.

Traffic: 1551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6