How to calculate read depth for a specific gene list?
0
2
Entering edit mode
5.2 years ago
Lila M ★ 1.2k

Hi everybody, I'm stuck in the middle of something. I have a selected gene list (bed file) like this:

chr    start   end    strand    gene        geneLength
chr1    185217  195411  -   ENSG00000279457 10194
chr1    725885  778626  -   ENSG00000228327 52741
chr1    725885  778626  -   ENSG00000228327 52741
chr1    916865  921016  -   ENSG00000223764 4151
chr1    916865  921016  -   ENSG00000223764 4151
chr1    916865  921016  -   ENSG00000223764 4151
chr1    944204  959309  -   ENSG00000188976 15105

I get them after different downstream analysis, and I need to calculate the read depth for them (only for those). So far, I've tried this:

genomeCoverageBed -ibam bam -g gene.list > coverage

But the output doesn't show the coverage for the specific gene.

and I also tried this:

bedtools coverage -a gene.list -b bam > coverage

but it gave me this error

***** WARNING: bam has inconsistent naming convention for record:
GL000008.2  12857   12916   SRR6497147.21984058 0   -

Killed

Any help? Thanks!!!

coverage readdepth interesct • 2.4k views
ADD COMMENT
1
Entering edit mode

What are the chromosome identifiers in your bam?

ADD REPLY
0
Entering edit mode

Hi, the chromosomes appeared as "chr1"

SRR6497149.3455795  16  chr1    10533   1   65M *   0   0AAGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGT  EGGGGFGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGDGGGGGGGGFGGFGGGEBBBBB   AS:i:-10    XS:i:-10    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:0G26C37    YT:Z:UU

however, there are additional chr that not start with "chr", as the one reported in the error. Is there any way in which I can remove those chr that are not the "typical"? Thanks!

ADD REPLY
1
Entering edit mode

Hey Lila, for the first command, genomeCoverageBed, you don't have the correct input data. The file passed with -g needs to be of this format:

chr1    249250621
chr2    243199373
...
chr18_gl000207_random   4262

This is merely the first 2 columns of the output produced by samtools faidx.

What exactly are you aiming to do? Note that, in my pipeline (here https://github.com/kevinblighe/ClinicalGradeDNAseq ), I output various coverage statistics via:

number of reads off target

bedtools intersect -v -bed -abam "${BAMfile}" -b "${BEDfile}" | wc -l > ReadsOffTarget.txt

output depth of coverage for all regions in the BAM file

[Sequential positions at the same read depth are merged into a single region]

bedtools genomecov -ibam "${BAMfile}" -bga -split > CoverageTotal.bedgraph

output the per base read depth for each region in the BED file

bedtools coverage -a "${BEDfile}" -b "${BAMfile}" -d > PerBaseDepthBED.bedgraph

output the mean depth of coverage for each region in the BED file

bedtools coverage -a "${BEDfile}" -b "${BAMfile}" -mean > MeanCoverageBED.bedgraph
ADD REPLY
2
Entering edit mode

Hi Kevin, Thank you for such an extensive response! What I basically did , was to transform my bam file into a bed one, so then I was able to get the coverage by running

bedtools coverage -a file.bed -b file.bedGraph -counts > coverage_by_counts
ADD REPLY

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6