Hi,
I am trying to turn my customized per-base counts of a bam file into bigwig format. From a bedgraph seems there is already a tool : bedGraphToBigWig. Is there already a tool which turns a per-base coverage report to bedgraph format?
Thank you,
Hi,
I am trying to turn my customized per-base counts of a bam file into bigwig format. From a bedgraph seems there is already a tool : bedGraphToBigWig. Is there already a tool which turns a per-base coverage report to bedgraph format?
Thank you,
Per base bedgraph should be of the following format:
Chr1 0 1 0.846
Chr1 1 2 0.745
Chr1 2 3 0.433
Chr1 3 4 0.313
Then you can convert it to bigwig easily using bedGraphToBigWig
from UCSCTools.
Having your "custom file" should help us to guide you about how to get the bedgraph format.
EDIT: Following the above comments and what your file looks like, consider this file:
per_base_score.txt
I 1 0.5
I 2 1.0
I 3 1.0
I guess that a simple awk
should give you what you are looking for, as suggested by Pierre in the comments:
awk '{print $1"\t"$2-1"\t"$2"\t"$3}' per_base_score.txt > per_base_score.bedgraph
bedGraphToBigWig per_base_score.bedgraph chrom.sizes per_base_score.bw
Be sure to get chrom.sizes
from (i.e. for yeast, as I guess with you chromosome names): https://hgdownload-test.gi.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes
If you have a custom genome (i.e. non-model organism), you can extract chrom.sizes from a fasta file following the recommendations there: Get chromosome sizes from fasta file
Be sure that your bedgraph / bigwig chromosome names are matching the ones from your other bed files for downstream analyses (chrI vs I).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
a simple awk ?
awk '{printf("%s\t%d\t%s\t%s\n",$1,int($2)-1,$2,$3);}'
Which looks like...?
Hey,
The format is the regular per-base coverage:
Just that I cannot simply get it from samtools depth nor by bedtools genomecov (which would make my life so easy). Coverage here is calculated by investigating reads and their number of multimaps. Anyways thank you for your help. I guess the awk line would do the work.
Cheers,
See Gautier's comment, which I'm moving to an answer, for the correct format.
Thanks Devon Ryan,
Gautier Richard's sample looks like what I get from "bedtools genomecov -bg". That I can easily imitate in my existing file. I am not sure though if it is fully compliant with the definition of bedgraph files from bedtools site:
"Whereas the -d option reports an output line describing the observed coverage at each and every position in the genome, the -bg option instead produces genome-wide coverage output in BEDGRAPH format. This is a much more concise representation since consecutive positions with the same coverage are reported as a single output line describing the start and end coordinate of the interval having the coverage level, followed by the coverage level itself."
What Gautier posted is fully compliant with the format.
Exactly! My question is that if I construct a file like this:
out of the previous one, would it be correct, or do I need to merge consecutive regions first?