Question

Help with CNV calling using ExomeDepth

0

Entering edit mode

3.4 years ago

luffy ▴ 110

Hello Everyone,

I am new to CNV and a beginner with R language. I am trying to call germline CNVs using exome data using ExomeDepth. I have tried the example given in this it was confusing for when i tried to apply/try with my data. Can someone please help me? explain/show steps which i need to follow to call CNVs. Also in the offical vignette they used Hg19 and i have data using Hg38, how do i go about it?

What data i have?
I have downloaded exome data from 1000g project, cleaned, duplicate marked and BQSR using GATK4 best practices for control. And similarly for sample bam files. In total i have 10 control and 20 sample bam files.

What am i trying to achieve? I am trying to call good quality CNVs using read depth method after read/searching i have finalied Exomedepth.

Since i am a beginner R i am find it difficult, can someone guide me with steps/commads?

Thank you so much for your time.

exomedepth R CNV gCNV exome • 2.8k views

ADD COMMENT • link updated 3.0 years ago by Joakim ▴ 40 • written 3.4 years ago by luffy ▴ 110

0

Entering edit mode

Hi, the biggest problem will be to switch to HG38. This is not a trivial replacement here, I think - too many annotations there are based on hg38. ExomeDepth is a good tool, but for a beginner who wants to work with HG38 I'd suggest to try some other tool.

ADD REPLY • link 3.4 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

@German.M.Demidov, Thank you for you input. The replacement here, Are you talking about the coordinates? can i use tools like UCSC liftover? or is it because of the 0 based 1 based problem? Those i can do with python/shell. Can you please elaborate on what the biggest problem you were talking about if i try to pursue this. if its such problem i will try do with python/shell than i would use R for the package.

Thank you so much for your time

ADD REPLY • link 3.4 years ago by luffy ▴ 110

0

Entering edit mode

I used the instruction located under the title "10 How to loop over the multiple samples" from the vignette. It starts from "data(Conrad.hg19)" (which means - load this data based on hg19) and it continues to use this data. If you are able to modify files and execute the first 2 commands from the section I've noted (number 10) - then it is almost done and is possible to use.

ADD REPLY • link 3.4 years ago by German.M.Demidov ★ 2.9k

score 3 · Answer 1 · 2021-04-13

Instead of using bedframes from data(genes.hg19) and data(exons.hg19) in ExomeDepth, I got them from the UCSC Table Browser for hg38 (http://genome.ucsc.edu/cgi-bin/hgTables). They only contains the following info: chromosome start end name

..and then run as before.

my.bam <- list.files(pattern=".bam$")

I renamed the columns similarly just in case: colnames(exons.hg38) <- c("chromosome", "start", "end", "name")

Create counts dataframe for all BAMs

my.counts <- getBamCounts(bed.frame = exons.hg38, bam.files = my.bam, include.chr = F)

Instead of annotating with Conrad I got the bedfile for DGV also from UCSC:

Annotation of CNV calls

DGV

dgv.hg38 <- read.csv("DGVmerged.csv", header=TRUE, sep = ";")

dgv.hg38.GRanges <- GenomicRanges::GRanges(seqnames = dgv.hg38$chromosome, IRanges::IRanges(start=dgv.hg38$start,end=dgv.hg38$end), names = dgv.hg38$dgv_name)

all.exons <- AnnotateExtra(x = all.exons, reference.annotation = dgv.hg38.GRanges, min.overlap = 0.5, column.name = 'dgv.hg38')