Biostar Beta. Not for public use.
Picard CollectWgsMetrics for each chromosome? Standard deviation of coverage per chromosome
3
Entering edit mode
6.3 years ago
@QVINTVS_FABIVS_MAXIMVS10070

Hi II'm running Picard on some BAM files and noticed that the default output for CollectWgsMetrics is for the whole genome. I would like to find the standard deviation of the coverage for each chromosome.

Does anyone know how to make this possible? Even using a different program. I have the output from bedtools genomeCov but I do not know how to infer the standard deviation of read depth for each chromosome.

picard standard deviation command • 3.2k views
0
Entering edit mode

This would be a really nice feature for all the picard Collect* commands. In our case, looking at the whole genome together isn't appropriate since some of the contigs are repeated elements (and some may well be contamination or other cruft). A few select contigs would provide a better characterization. Also, the current "all the 'genome'" approach doesn't make sense if you are competitively mapping to multiple genomes at once (doing WGS metagenomics for example).

IMO, the ideal implementation would be a command-line argument (lets call it CHROMS) where you give a list of chroms/contigs to process together as a single unit. The default (null) would process all the chroms/contigs.

To generate stats for each chrom/contig, you could either run picard multiple times, specifying a single chrom/contig each time. Probably not ideal, but it would be trivial to implement.
Alternatively, if the framework allows it, the CHROMS argument could be given multiple times, and the stats for each one would be output separately (and perhaps overall stats too).

Anyway, let me see if I can find the right place to send a feature request to. You might want to do the same.

1
Entering edit mode
6.3 years ago
travcollier • 160
@travcollier9224

I added INTERVAL_RANGE (taking an interval file) and INTERVAL_RANGE_STRING (taking a whitespace separated list of samtools-like chrom[:start-[end]] regions) to CollectWgsMetrics. Hopefully these changes will get merged into the picard distribution soon.

In the meantime, the source is on github:
https://github.com/travc/picard
https://github.com/travc/gittemp

To generate per-chromosome output using my version of CollectWgsMetrics.jar (assuming you have the command parallel avaialbe):

ref.fa is your indexed reference sequence (index file is ref.fa.fai)
in.bam is your bam file... indexed if you want the commands to be fast

parallel --gnu --colsep '\t' -a ref.fa.fai java -Xmx4g -jar CollectWgsMetrics.jar INPUT=in.bam REFERENCE_SEQUENCE=ref.fa OUTPUT="WGS_{1}.out" INTERVAL_LIST_STRING="{1}"

You might need to change -Xmx4g if more or less memory is needed.

0
Entering edit mode

That would be a very useful addition to the tool.

0
Entering edit mode
6.4 years ago
Dan D 6.8k
@Dan D4040

I would split the BAM by chromosome:

bamtools split -in [bamfile] -reference


And then run the Picard CollectWGSMetrics tool on each of the BAM files. The output of the tool is highly regular so it would be easy to then combine the results into a single text file afterwards.

0
Entering edit mode
6.4 years ago
@Pierre Lindenbaum30

I would use GATK DepthOfCoverage with a BED file containing one chromosome per line : chrom/0/leng(chrom)

0
Entering edit mode

The last time I used DepthOfCoverage, it was really slow in comparison to Picard's complement. Also, I don't think it does the extensive filtering and reporting for duplicates, pass-filter reads, etc that CollectWgsMetrics performs. it's been a while since I've used the GATK tool though--perhaps it's been updated.

0
Entering edit mode
6.3 years ago
travcollier • 160
@travcollier9224

Apologies... I cannot seem to get this to actually work myself. In theory it should work, but there is nastiness with the reference and bam header which I just can't seem to sort out.

You can stream individual regions to CollectWgsMetrics (thanks to Nils Homer for reminding me of that).

samtools view -b <in.bam> | java -jar CollectWgsMetrics.jar I=/dev/stdin ...

You could just do that for each chromosome.