Question

Associating Viral Sequence With Epidemiological Data

3

Entering edit mode

12.7 years ago

Agapow ▴ 270

The setup: I have a large number of sequences from a viral pathogen and the associated epidemiological data, collected during a major disease outbreak. I deeply suspect that the varied epidemiology seen across the outbreak (case severity and outcomes, transmission rate, etc.) is the result of changes in the viral sequence.

The question: So, how do I best correlate these epidemiological data with sequence data? In the crudest sense, how can I point at a SNP and say "this is associated with more severe cases"?

Complications:

I'm concerned about phylogenetic inertia, i.e. false correlations caused by evolutionary relationship. A given sequence change may correlate with increased fatality because it was fixed in the lineage that infected a weakened group of hosts.
Some characteristics which are technically non-heritable will behave as heritable, e.g. location.

Solutions I've considered:

Tools from GWAS studies or similar: apart from the possible overkill of using these on such a short genome, I don't know of any GWAS tools that deal with the inertia problem..
Comparative analysis with independent contrasts: would be the obvious choice if I was dealing with solely character data. I could hack an suitable dataset together, say by treating a SNP loci as a character, but it seems ugly. Also, the state of useful software here is not good.
Selection: will tell me what sites are being selected for but not what might be correlated with that selection.
Compare controls: is something I've done before, but in this case it seems that deciding what to control for is pre-emptively deciding what won't correlate.

evolution snp association • 2.2k views

ADD COMMENT • link updated 12.7 years ago by David Quigley 11k • written 12.7 years ago by Agapow ▴ 270

0

Entering edit mode

Exactly what kind of epidemiological data do you have, e.g. is it already aggregated by viral sequence, or do you have individual case data at your disposal?

ADD REPLY • link 12.7 years ago by Meredith • 0

0

Entering edit mode

Individual case data, dates, outcomes, the whole paella.

ADD REPLY • link 12.7 years ago by Agapow ▴ 270

score 1 · Answer 1 · 2011-08-15

You should look at the literature on evolution of tumors, e.g. Navin Nature 2011, Bozic PNAS 2010, Wood Science 2007. Although tumors are often called clonal populations, a more sophisticated model of tumor evolution starts with a single aberrant cell producing a heterogeneous population of offspring which mutate independently within the overall tumor mass. The problem of determining which of many somatic mutations is a strong candidate to be causal (a.k.a. a driver mutation) and which is a bystander or passenger mutation is the same as your inertia problem.

I would guess that creating a phylogenetic tree based off of the sequence alterations would be crucial for establishing causality candidates, but this isn't really my area of expertise. You might structure the question as a set of regressions, asking whether a given alteration is associated with your phenotype and then looking for progenitor alterations highest up on the tree. If you're using a regression-based statistic you can control for biases such as location. I would consult a card-carrying molecular epidemiologist to avoid re-inventing the wheel.