Dealing with abnormally high coverage regions
0
0
Entering edit mode
7.5 years ago
novice ★ 1.1k

Hello,

I'm seeking methodology advice on a little project I'm working on. I've identified structural variations over an entire genome and found few spots especially rich in variations. I then checked the read coverage over these spots and found a few of them to have an abnormally high coverage (~3X average). I'm not sure how to interpret this correlation. Does this mean the high number of variations in these regions is artificial? How can I test this further? Could I disregard variations in these high-coverage regions for my later analyses?

[Edit] Additional Information:

  • Working with S. cerevisiae
  • Data is WGS, paired-end
  • Used PEM + SP to detect variations
  • Verified variations with de-novo assembly
coverage alignment • 2.4k views
ADD COMMENT
0
Entering edit mode

What kind of data do you have? WGS? exome? RNAseq?

ADD REPLY
0
Entering edit mode

Whole Genome Sequencing

ADD REPLY
0
Entering edit mode

Try to be complete in your initial posts since something like this is a very important piece of information.

ADD REPLY
0
Entering edit mode

How did you identify those spots?

ADD REPLY
0
Entering edit mode

Paired-end mapping, split-read mapping, and refinement with de-novo assembly

ADD REPLY
0
Entering edit mode

Did you check for low mappability or presence of segmental duplications? Was mapping quality taken into account?

ADD REPLY
0
Entering edit mode

Did you check for low mappability or presence of segmental duplications?

No.

Was mapping quality taken into account?

Yes, I used a minimum mapping quality of 20.

ADD REPLY
0
Entering edit mode

If you happen to be working with human/mouse: do these sites happen to overlap blacklisted regions?

ADD REPLY
0
Entering edit mode

I'm working with yeast (S. Cerevisiae). I don't think there's a blacklist available for this species. Sorry for the lack of clarity in the original post!

ADD REPLY
0
Entering edit mode

Blacklisted regions?

ADD REPLY
0
Entering edit mode

I never heard of this concept before but searching for it I found a definition

https://sites.google.com/site/anshulkundaje/projects/blacklists

it says

artifact regions that tend to show artificially high signal

what seems to be lacking is a reasoning of why that be so. I find it a bit excessive to flat out just remove whole regions based on "blaclists". Do people actually do this? Surprised that's all.

ADD REPLY
0
Entering edit mode

It's standard in ChIPseq, but pretty much no where else (it's normally not useful elsewhere). Some of these seem to be rRNAs or other similar "improperly assembled" regions, which I imagine could cause issues in OPs use-case.

ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6