Biostar Beta. Not for public use.
Whole exome length of exons from bam file
1
Entering edit mode
18 months ago
United States

Hi,

I am trying to figure out the total length of all the exons covered in whole exome sequencing data which I downloaded from TCGA. In other words I am trying to figure out what is the total length of all the bp covered by the reads on the reference genome. Fastq file would probably not help since the number of reads would vary depending on the depth of sequencing. I tried to use the BAM files and samtools but could not find any tool in samtools that would give me total length of all the alignment coordinates. Is there a tool available to get this information or does someone know of a script that can help with this calculation. Basically I would need to extract the coordinates of all the unique alignments, calculate their length and add them up, unless there is some other subtlety that I am missing. Thanks for your help.

- Pankaj

whole exome length • 1.6k views
ADD COMMENTlink
0
Entering edit mode
17 months ago
Darked89 4.2k
Barcelona, Spain

Maybe figuring out the number of bases with coverage > 0 will be good enough?

http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

Using samtools view you can filter out the zero quality reads, and digging deeper in SAM flags non-primary alignments.

Hope it helps

ADD COMMENTlink
0
Entering edit mode
Do you have a bed file of all the exons?
ADD REPLYlink
0
Entering edit mode

I don't have a bed file of the exons but I have sent a request to CGHub help desk to ask for one. Assuming I can get one, could you please advise how I could use it to get the information I need. Thanks.

ADD REPLYlink
0
Entering edit mode

to size a bed file

awk -F '\t' 'BEGIN{SUM=0}{SUM+=$3-$2}END{print SUM}' foo.bed
ADD REPLYlink
0
Entering edit mode

I used the following command to output [chr pos depth]:

bedtools genomecov -d -ibam bedtools_test_alignment_sorted.bam > bedtools_test_output.txt

but the out file is huge, almost 48 GB, for a BAM file of 8 GB. This file can be processed but will take a while. Is there some way that bedtools will only output positions with depth > 0. That would reduce the output file size a lot. Thanks.

ADD REPLYlink
0
Entering edit mode
15 months ago
bioguy24 • 190
Chicago

Will this awk pipe work?

bedtools genomecov -ibam aln.bam -bga | awk '$4==0' | head -n 2 

chr1 0 554304 0 
chr1 554314 554315 0
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1