Forum:What are the pain points in genomic variant interpretation/annotation processes?
2
2
Entering edit mode
2.9 years ago
Ryangguk Kim ▴ 90

Hi, I develop genomic variant annotation/interpretation tools. I'd like to know the pain and need regarding variant annotation/interpretation tools. Can you point out where your pain points are in doing variant annotation/interpretation?

Also, if you wouldn't mind, can anyone chat with for just 15 minutes so that I can listen to you talking about what you do with variant analysis?

analysis variant • 2.4k views
ADD COMMENT
0
Entering edit mode

I develop genomic variant annotation/interpretation tools

Can you point us to a few tools you've developed?

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

There was a comment about the vcf format. We were writing a vcf format parser and had some headache due to a couple of variant caller-specific conventions/modifications which threw the parser off. What were the problems you encountered dealing with the vcf format?

ADD REPLY
0
Entering edit mode

Since this is an open-ended question I changed the type to forum. Consider editing the title and making it What are the pain points in genomic .. to make the title clear.

ADD REPLY
0
Entering edit mode

Thanks. I have edited the title.

ADD REPLY
0
Entering edit mode

I'm not in this area but I got the feeling that the non-ML tools suck at deleterious variants in non-coding regions and the ML tools are all overfit to a particular disease

ADD REPLY
3
Entering edit mode
2.9 years ago
Zhenyu Zhang ★ 1.2k

I have generated more than 100,000 VCF files. Some pain points are

  1. To annotate multiple variants on the same transcript.
  2. To represent complicated SV and CNV.
  3. To annotated completed variants, including SV, CNV and some INDELs.
  4. Variant normalization.
  5. HGVS sucks (but there is no better alternatives)

Btw, if you are looking for your buddies in this field, I strongly suggest you to join GA4GH Variant Annotation and Variant Representation working groups.

ADD COMMENT
0
Entering edit mode

Thanks Zhenyu. The list makes sense. I participated in some GA4GH calls (of the two working groups) and events.

ADD REPLY
0
Entering edit mode

By the way, I have a follow-up question, if I may - how were those VCF files used downstream in your work?

ADD REPLY
0
Entering edit mode

We (GDC) make MAFs and share all data with the research community.

ADD REPLY
1
Entering edit mode
2.9 years ago

Every program should:

  • output variant consequences over all transcript isoforms
  • output variant annotation in HGVS
  • output the tissue in which each isoform is most expressed
  • output Ensembl gene IDs and HGNC gene symbols
  • indicate orientation of the bases to the reference genome (as we know, many 'variants' are the very reference bases in hg19 and hg38)
ADD COMMENT
0
Entering edit mode

Thanks Kevin. I have a couple of follow-up questions.

When you get variant consequences over all transcript isoforms, what do you do with them downstream? I saw a few different approaches: use the most deleterious consequence, use the consequence of a pre-determined "representative" transcript such as MANE, etc. Is it related to the expression level in each tissue you mentioned, such as seeing if the dominant isoform in a tissue had a deleterious consequence?

Regarding the tissue in which each isoform is most expressed, can you point me to some data sources that have such information?

ADD REPLY
1
Entering edit mode

I am no longer directly involved in the variant interpretation part; however, the clinical scientists with whom I worked [in NHS England] checked variant consequences over all known isoforms via a program (I believe Alamut). They would use literature searches to determine if a consequence over a given isoform was important, before signing the report. The final decision, later, is then in the hands of the referring doctor, sister-laboratory, Lab Director, or Genetic Counsellor, depending on the exact origin of the sample.

My role, as Lead Bioinformatician, was to simply output the variant listing and ensure that nothing was missed, after which they were content to take care of everything. A simple run of GATK, DeepVariant, SAMtools, etc. will miss quite a few clinically-actionable variants.

I would personally not be interested in just seeing the most deleterious consequence, in part due to my lack of trust in NGS data, and also due to the fact that I understand just how complex is the genome.

Regarding MANE, there are Ensembl representatives here and I believe MANE is already in use after there was an initial poll ~2 years ago. Cannot find it right now.

ADD REPLY
1
Entering edit mode

Thanks Kevin for your kind explanation.

ADD REPLY

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6