Do structural variations on pseudo-gene mean any thing?
2
1
Entering edit mode
6.1 years ago
haiying.kong ▴ 360

I found structural variations from WES data. The top frequency genes are almost all pseudogenes. Could anyone please give me any advice?

biology • 1.3k views
ADD COMMENT
1
Entering edit mode
6.1 years ago
d-cameron ★ 2.9k

These are likely to be false positive artefacts due to the sequencing similarity between the gene and the pseudogene. Typical features of such false positives are:

  • lack of split read support across the putative breakpoint
  • presence of the SV in the germline or a panel of normals.
  • long inexact sequence homology in the flanking sequence. A pair-wise sequence alignment of the two flanking regions will usual show that your 'SV' is really just an alignment artefact

These artefacts are quite common and my approach to removing them is to use both a panel of normals, as well as filtering any calls containing any large (50+bp) imperfect sequence homology. For example, in one of the cancer panels I'm working with, we find an SV between NOTCH2 and a NOTCH2-LIKE pseudogene in every single sample (both normal and tumour).

ADD COMMENT
0
Entering edit mode
6.1 years ago

My first concern is this: if you have found structural variants in pseudogenes, and assuming that you have used short-read NGS, how can you even be sure that the reads originated from the pseudogenes? Even if you're doing a standard PCR experiment, you'll still have potential issues unless you have designed your primers correctly.

In genomics, things are not as simple as we'd like... sequence similarity across the genome is a major confounding factor at virtually all steps in the NGS in vitro and in silico processes.

So, you will have to elaborate on how you have even identified these structural variants in the first place. All of this being said, as many pseudogenes are not even transcribed and are functionless due to having lost their promoters and/or TSS, they therefore do not face selective pressures and do tend to pick up more variation than their coding counterparts. In other cases, they acquire new functionality through mutations over generations.

Another issue: in many cases, the entire gene is not even duplicated, meaning that only a few introns or exons form the pseudogene. In other cases, the transcribed mRNA (exons only) of the original gene is re-integrated into the genome - these are known as 'processed pseudogenes'. Others that include the whole genomic sequence (or part of it), e.g., introns and exons, are 'unprocessed pseudogenes'.

Kevin

ADD COMMENT
1
Entering edit mode

Processed pseudogenes show up a genic intron deletions when the pseudogene is not in the reference. There are around 20-30 of these that are quite common in the population.

ADD REPLY
0
Entering edit mode

Thank you very much for your reply.

I have whole exome sequence data from matched blood and tumor samples, used Mirkat to identify somatic structural variations, then annotated them with gene names.

ADD REPLY

Login before adding your answer.

Traffic: 3107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6