Question

Recommended Approach For Copy Number Analysis In Non-Human Organisms

2

Entering edit mode

10.1 years ago

Noushin N ▴ 600

Hello everyone,

I was wondering if you can share your experiences on copy number analysis (somatic) in non-human organisms. Since whole genome sequencing is not feasible, I would appreciate hints on array based approaches (e.g. mouse). I will also be glad if such analysis could be possible using whole exome sequencing; in which case my following question would be what the recommended tools are..

Thank you,

Noushin

copynumber mouse array somatic • 3.2k views

ADD COMMENT • link updated 10.1 years ago by Stefano Berri 4.4k • written 10.1 years ago by Noushin N ▴ 600

score 1 · Answer 1 · 2014-03-06

1

Entering edit mode

10.1 years ago

Stefano Berri 4.4k

Hi. You don't need high coverage data for copy number detection at the resolution of arrays.

You could use CNAnorm. It is designed to detect somatic CNA from low coverage genomic data (2 Million reads would be enough, very affordable if you multiplex on a run) and it does not assumes a particular reference genome (it has extra features if you use hg19, but they are mainly cosmetic)

You could use the development version that has some nice extra features and a better vignette. It will become release in April. I have tested on capture exome data, and it works pretty well. You can find further information and link to the paper here

ADD COMMENT • link 10.1 years ago by Stefano Berri 4.4k

0

Entering edit mode

Hi Stefano,

Thanks a lot for your suggestion. I looked at the documentation and it looks very promising. I will give it a try and will update this post based on my experience here. Best!

ADD REPLY • link 10.1 years ago by Noushin N ▴ 600

0

Entering edit mode

Upon getting started to run CNAnorm, I realized that one needs to specify a window width. Can I ask you if you know of any considerations one should be aware of when selecting window size for exome sequencing data?

ADD REPLY • link 10.1 years ago by Noushin N ▴ 600

0

Entering edit mode

Hi. exome is a bit more tricky becose is uneven, but as a rule of thumb, try to have, as average, 50 reads per window. In gene rich regions you will have more, in gene poor a bit less. HOw many reads do you have in total? Good luck.

Stefano

ADD REPLY • link 10.1 years ago by Stefano Berri 4.4k

0

Entering edit mode

Thank you for the prompt response. That is exactly what I had in mind. In the exome scenario, doesn't this requirement favor quite large window sizes on average? My naive sense is that if one wants to brute force 50 reads per window for a fixed window size, the 99% of genome outside coding region will make this optimal window size quite large. I have in excess of 50 million reads. Thanks again!

ADD REPLY • link 10.1 years ago by Noushin N ▴ 600

0

Entering edit mode

50M reads is quite a lot, actually. In CNAnorm all windows are equally sized. If you set 10Kbp windows, you would get an average of 170 reads per window. Which is plenty. From a quick count, 85% of exons are less than 10Kbp apart, and 93% less than 25kbp apart, so most of your windows will have some reads.

ADD REPLY • link 10.1 years ago by Stefano Berri 4.4k