GC Bias in Sequence Capture - Analysis & Quantification
0
0
Entering edit mode
5.1 years ago
jnowacki ▴ 100

QUESTION
I have a bed file & alignment file (bam). I'd like to know if the reads contained inside the bam are biased towards GC rich regions.

RESEARCH

Picard CollectGcBiasMetrics does a great job with whole genome sequencing but doesn't take bed files as input.

Picard HsMetrics can tell me GC % & coverage per region but not all regions are the same size.

I thought about taking the original bedfile and creating a new bed file of sliding 100 base pair windows with bedtools makewindows and then analyzing with Picard HsMetrics. The problem with that is if I use a window of 100 bp and step of 10 bp then large capture regions will have more windows than small capture regions.

For example: ORIGINAL BED FILE ----> NEW BED FILE
100 bp region ----> 1 window of 100 base pairs, 1x coverage in new bed file
1000 bp region ----> 90 windows of 100 base pairs. 10x the size but 90x as many windows.

If I pad things out then there will be windows that have 0 depth of coverage by the bam file and these will be clustered around areas with few base pairs (small capture regions).

Gcbias sequencing sequence capture bed picard • 1.5k views
ADD COMMENT
0
Entering edit mode

Hello jnowacki ,

could you please explain what you hope to find out and why the output of HsMetrics isn't suitable for your needs?

Thanks.

fin swimmer

ADD REPLY
0
Entering edit mode

I'm trying to detect GC bias. The problem with the normal HsMetrics output is not all of the bed regions are the same size so any binning/graphing will heavily weight the results towards whatever GC content is contained in average of the smaller regions.

ADD REPLY

Login before adding your answer.

Traffic: 2461 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6