Help with CNV calling using ExomeDepth
1
0
Entering edit mode
3.4 years ago
luffy ▴ 110

Hello Everyone,

I am new to CNV and a beginner with R language. I am trying to call germline CNVs using exome data using ExomeDepth. I have tried the example given in this it was confusing for when i tried to apply/try with my data. Can someone please help me? explain/show steps which i need to follow to call CNVs. Also in the offical vignette they used Hg19 and i have data using Hg38, how do i go about it?

What data i have?
I have downloaded exome data from 1000g project, cleaned, duplicate marked and BQSR using GATK4 best practices for control. And similarly for sample bam files. In total i have 10 control and 20 sample bam files.

What am i trying to achieve? I am trying to call good quality CNVs using read depth method after read/searching i have finalied Exomedepth.

Since i am a beginner R i am find it difficult, can someone guide me with steps/commads?

Thank you so much for your time.

exomedepth R CNV gCNV exome • 2.8k views
ADD COMMENT
0
Entering edit mode

Hi, the biggest problem will be to switch to HG38. This is not a trivial replacement here, I think - too many annotations there are based on hg38. ExomeDepth is a good tool, but for a beginner who wants to work with HG38 I'd suggest to try some other tool.

ADD REPLY
0
Entering edit mode

@German.M.Demidov, Thank you for you input. The replacement here, Are you talking about the coordinates? can i use tools like UCSC liftover? or is it because of the 0 based 1 based problem? Those i can do with python/shell. Can you please elaborate on what the biggest problem you were talking about if i try to pursue this. if its such problem i will try do with python/shell than i would use R for the package.

Thank you so much for your time

ADD REPLY
0
Entering edit mode

I used the instruction located under the title "10 How to loop over the multiple samples" from the vignette. It starts from "data(Conrad.hg19)" (which means - load this data based on hg19) and it continues to use this data. If you are able to modify files and execute the first 2 commands from the section I've noted (number 10) - then it is almost done and is possible to use.

ADD REPLY
3
Entering edit mode
3.0 years ago
Joakim ▴ 40

Instead of using bedframes from data(genes.hg19) and data(exons.hg19) in ExomeDepth, I got them from the UCSC Table Browser for hg38 (http://genome.ucsc.edu/cgi-bin/hgTables). They only contains the following info: chromosome start end name

..and then run as before.

my.bam <- list.files(pattern=".bam$")

I renamed the columns similarly just in case: colnames(exons.hg38) <- c("chromosome", "start", "end", "name")

Create counts dataframe for all BAMs

my.counts <- getBamCounts(bed.frame = exons.hg38, bam.files = my.bam, include.chr = F)

Instead of annotating with Conrad I got the bedfile for DGV also from UCSC:

Annotation of CNV calls

DGV

dgv.hg38 <- read.csv("DGVmerged.csv", header=TRUE, sep = ";")

dgv.hg38.GRanges <- GenomicRanges::GRanges(seqnames = dgv.hg38$chromosome, IRanges::IRanges(start=dgv.hg38$start,end=dgv.hg38$end), names = dgv.hg38$dgv_name)

all.exons <- AnnotateExtra(x = all.exons, reference.annotation = dgv.hg38.GRanges, min.overlap = 0.5, column.name = 'dgv.hg38')

ADD COMMENT

Login before adding your answer.

Traffic: 2359 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6