Question

Output bedGraph from genomeCoverageBed is missing "chr"

1

Entering edit mode

5.6 years ago

m98 ▴ 420

I am trying to make a UCSC Genome Browser hub using bigWig files. My starting file format are BAM files. Briefly, my script does as follows:

Scale the BAM file according to a specificic pre-determined scaling factor (using samtools).
From scaled BAM, make a bedgraph representing coverage (using genomeCoverageBed)
From beGgraph file, make bigwig file (using bedGraphToBigwig)

My script looks as follows:

samtools view -s 0.75 -b file.bam > file.scaled.bam;
genomeCoverageBed -bg -split -ibam file.scaled.bam; -g hg38.chrom.sizes > file.bedgraph;
sort -k1,1 -k2,2n file.bedgraph > file.sorted.bedgraph;
bedGraphToBigWig file.sorted.bedgraph hg38.chrom.sizes file.bw

I am having an issue displaying my bigwig file and noticed this is most likely due to my bedGraph file not having a "chr" in front of each chromosome name (so I have "1" instead of "chr1"). I am confused as how this happened.

Initially, when first testing genomeCoverageBed, I kept getting the following warning:

*****WARNING: Genome (-g) files are ignored when BAM input is provided.

I therefore removed the -g argument in the genomeCoverageBed. However, I just noticed (I should have noticed before) that the "chr" is missing in my bedgraph file. Presumably, this is what is causing my bigwig file to then be ill-formatted.

I am confused, am I missing something? Why is genomeCoverageBed not outputting a bedGraph file with "chr" in it, like it should be doing?

Thanks

RNA-Seq ucsc genomeCoverageBed bedGraph bigWig • 3.4k views

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 5.6 years ago by m98 ▴ 420

2

Entering edit mode

Output of samtools view -H file.bam ? Bet your reference genome simply was like 1,2,3 and not chr1,chr2,chr3.

ADD REPLY • link 5.6 years ago by ATpoint 81k

score 1 · Answer 1 · 2018-09-10

1

Entering edit mode

5.6 years ago

jared.andrews07 ★ 16k

Are they there in your SAM file? It's possible whatever reference it was aligned to just didn't have them. You can easily add them back via awk:

awk '{print "chr" $0}' file.bedgraph > newfile.bedgraph

Also your second line has an erroneous semicolon.

ADD COMMENT • link 5.6 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Yeah it seems to be the case. The BAM headers have the following tags:

@SQ SN:1     
@SQ SN:2     
etc

I tried doing this awk command you suggested but can't get it to work in the context of a shell script at the moment. Not sure how to get awk to recognize $0 as the whole line.

ADD REPLY • link 5.6 years ago by m98 ▴ 420

0

Entering edit mode

What is the error? The command should work fine on the bedGraph.

ADD REPLY • link 5.6 years ago by ATpoint 81k

0

Entering edit mode

I sorted it out I think. I was using this command within a shell script so instead of $0 being the whole line of the bedgraph, it was outputting the name of the script. I used this instead:

 awk -v LINE="$0" '{print "chr" $LINE}' $BEDGRAPH

ADD REPLY • link 5.6 years ago by m98 ▴ 420

0

Entering edit mode

Don't make simple things complicated ;-D

ADD REPLY • link 5.6 years ago by ATpoint 81k

0

Entering edit mode

What do you mean? The awk command suggested by jared.andrews07 will not work within a script. I'm guessing this is maybe a simpler way than my awk command? I'm not very comfortable changing a BedGraph like this to be honest, I wish there was a way out outputting a bedGraph with what I want directly.

ADD REPLY • link 5.6 years ago by m98 ▴ 420

1

Entering edit mode

There is, add the 'chr' to the headers of your reference genome FASTA file if desired.

ADD REPLY • link 5.6 years ago by jared.andrews07 ★ 16k