Question

Per-base coverage to bedgraph or bigwig

0

Entering edit mode

5.2 years ago

pyKey ▴ 70

Hi,

I am trying to turn my customized per-base counts of a bam file into bigwig format. From a bedgraph seems there is already a tool : bedGraphToBigWig. Is there already a tool which turns a per-base coverage report to bedgraph format?

Thank you,

bam coverage bedtools bigwig • 3.2k views

ADD COMMENT • link 5.2 years ago by pyKey ▴ 70

2

Entering edit mode

a simple awk ? awk '{printf("%s\t%d\t%s\t%s\n",$1,int($2)-1,$2,$3);}'

ADD REPLY • link 5.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

customized per-base counts

Which looks like...?

ADD REPLY • link 5.2 years ago by ATpoint 81k

0

Entering edit mode

Hey,

The format is the regular per-base coverage:

I   1   0.5
I   2   1.0
I   3   1.0
I   11  3.5
I   12  3.5
I   13  3.5
I   14  4.0
I   18  5.5
I   19  5.5
I   20  6.0
I   21  6.0

Just that I cannot simply get it from samtools depth nor by bedtools genomecov (which would make my life so easy). Coverage here is calculated by investigating reads and their number of multimaps. Anyways thank you for your help. I guess the awk line would do the work.

Cheers,

ADD REPLY • link 5.2 years ago by pyKey ▴ 70

0

Entering edit mode

See Gautier's comment, which I'm moving to an answer, for the correct format.

ADD REPLY • link 5.2 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon Ryan,

Gautier Richard's sample looks like what I get from "bedtools genomecov -bg". That I can easily imitate in my existing file. I am not sure though if it is fully compliant with the definition of bedgraph files from bedtools site:

"Whereas the -d option reports an output line describing the observed coverage at each and every position in the genome, the -bg option instead produces genome-wide coverage output in BEDGRAPH format. This is a much more concise representation since consecutive positions with the same coverage are reported as a single output line describing the start and end coordinate of the interval having the coverage level, followed by the coverage level itself."

ADD REPLY • link 5.2 years ago by pyKey ▴ 70

0

Entering edit mode

What Gautier posted is fully compliant with the format.

ADD REPLY • link 5.2 years ago by Devon Ryan 104k

0

Entering edit mode

Exactly! My question is that if I construct a file like this:

I   1   1     0.5
I   2   2     1.0
I   3   3     1.0
I   11  11  3.5
I   12  12  3.5
I   13  13  3.5
I   14  14  4.0
I   18  18  5.5
I   19  18  5.5

out of the previous one, would it be correct, or do I need to merge consecutive regions first?

ADD REPLY • link 5.2 years ago by pyKey ▴ 70

score 3 · Answer 1 · 2019-01-30

Per base bedgraph should be of the following format:

Chr1 0 1 0.846 
Chr1 1 2 0.745 
Chr1 2 3 0.433
Chr1 3 4 0.313

Then you can convert it to bigwig easily using bedGraphToBigWig from UCSCTools.

Having your "custom file" should help us to guide you about how to get the bedgraph format.

EDIT: Following the above comments and what your file looks like, consider this file:

per_base_score.txt

I   1   0.5
I   2   1.0
I   3   1.0

I guess that a simple awk should give you what you are looking for, as suggested by Pierre in the comments:

awk '{print $1"\t"$2-1"\t"$2"\t"$3}' per_base_score.txt > per_base_score.bedgraph
bedGraphToBigWig per_base_score.bedgraph chrom.sizes per_base_score.bw

Be sure to get chrom.sizes from (i.e. for yeast, as I guess with you chromosome names): https://hgdownload-test.gi.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes

If you have a custom genome (i.e. non-model organism), you can extract chrom.sizes from a fasta file following the recommendations there: Get chromosome sizes from fasta file

Be sure that your bedgraph / bigwig chromosome names are matching the ones from your other bed files for downstream analyses (chrI vs I).