Question

how to do further analysis after variant calling and annotation?

0

Entering edit mode

7.9 years ago

ShirleyDai ▴ 50

Hi, I just finished calling snps and indels by GATK Mutect2 and got some VCF files. I want to annotate them with Oncotator or annovar. But I do not know how to further process them after annotation, like to draw figures and visualization and statistics. Is there some packages or softwares that can create publication-quality infographics and illustrations? Any one have some experience?

I have 230 paired WES from tumor and adjacent tissues and all covariates with them.

Thanks

Shirley

sequencing next-gen • 2.5k views

ADD COMMENT • link updated 7.1 years ago by Biostar 20 • written 7.9 years ago by ShirleyDai ▴ 50

score 0 · Answer 1 · 2016-05-19

0

Entering edit mode

7.9 years ago

jackfrost2199 ▴ 70

I would suggest R. You can generate lots of different good visualizations there. It is free and pretty easy to use (although a little bit of a learning curve so hang in there). You just load the data you want and then tell it what to plot and how. You can do lots of customization and then fix issues in the exported visualization in something like illustrator or inkscape (or photoshop). You can also run pretty much any statistical test you're likely to be interested in there.

ADD COMMENT • link 7.9 years ago by jackfrost2199 ▴ 70

0

Entering edit mode

Is there some packages that you would like to suggest? I'm really a biginner here. I have used edgeR or DESeq which are only suitable for RNA-seq data.

ADD REPLY • link 7.9 years ago by ShirleyDai ▴ 50

1

Entering edit mode

If you want to visualize the VCFs you created, try IGV (http://www.broadinstitute.org/igv/).

You've basically hit the end of the automatic pipeline where you can run one tool after another and have something meaningful without designing a bioinformatics experiment. Mutect2 did the equivalent of edgeR (in a way) to use your analogy and so now you're stuck with analyzing the data (usually through statistical tests that depend completely on your hypothesis). Its true that edgeR generates images whereas I don't think Mutect2 does, but that's because it is a type of data that lends itself to some simple and informative visualizations where SNV data doesn't necessarily. The good news is, this is where the blind running of a pipeline ends, and the scientific thought begins!

If your experiment was just to generate SNVs for tumor/normal pairs, congratulations, you've completed that. This is the same type of information provided by TCGA for a number of different cancers. If you search the literature you'll see researchers analyzing this data in lots of different ways. You could see if the questions they are asking match what you are interested in and try to use their analysis methodology assuming your data meets whatever assumptions it has.

If you absolutely need some further direction, you can see what resesarchers did in this paper http://www.cell.com/cell/abstract/S0092-8674(12)01022-7?_returnURL=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867412010227%3Fshowall%3Dtrue (this is smokers/non-smokers but it is the same principle as tumor/normal). You could also look into a gene ontology type analysis looking at the type of genes mutated (which you should have in your annotations) and just creating a table and a pie chart of what percentage in each ontology group. You might already have the ontologies in your annotations but you'll need to extract this information (or map your coordinates to a database from here: http://geneontology.org/). I don't know of any tool that just automates this for you (there are some technical reason why this would be hard in VCF files in particular).

Good luck!

ADD REPLY • link 7.9 years ago by jackfrost2199 ▴ 70

0

Entering edit mode

Many thanks, Jack!!!

ADD REPLY • link 7.9 years ago by ShirleyDai ▴ 50

0

Entering edit mode

No problem, I'm happy to help if you have any problems with the next steps.

ADD REPLY • link 7.9 years ago by jackfrost2199 ▴ 70