Biostar Beta. Not for public use.
Variant consensus sequence generator
0
Entering edit mode
14 months ago
bharata1803 • 420
Japan

Hello,

I want to ask about consensus sequence generated from variant data. Let's say I have a region like below:

ACATGACGATACTAACGGAACC

From that region, I found 2 SNP on the 3rd and 10th nucleotide like below:

POS -- REF -- ALT

3 -- A -- C

10 -- T -- A

My question is, if I want to apply the consensus function, there are 2 possible sequence:

  1. heterozygous sequences with 1 sequence only 1 mutation on 3rd nucleotide AND 1 sequence mutated on 10th nucleotide

  2. heterozygous sequences with 1 sequence is similar to reference and other sequence consist of both mutation on 3rd and 10th nucelotides.

My question is, how to decide which is the best represntative of the consensus sequence?

ADD COMMENTlink
0
Entering edit mode

My question is, how to decide which is the best represntative of the consensus sequence?

The question is: What do you want to do with the consensus sequence? What's the biological question you are trying to answer?

ADD REPLYlink
0
Entering edit mode

I want to check the protein translated from the variation sequence. Probably one SNP can change the start codon or stop codo. So, probably the variations change some amino acid sequence and I want to see whether it affect the protein sequence or not. I want to find that in the data.

ADD REPLYlink
0
Entering edit mode

So (assuming your organism of interest is diploid) you would need to know if those two variants are in cis or in trans?

ADD REPLYlink
0
Entering edit mode

My organism is human. I think I will need to know that. So, basically, I wrote a simple program to map the variation to transcript sequence and I want to know what kind of transcript sequence (in FASTA format) it has with variation substituted to the transcript reference.

ADD REPLYlink
0
Entering edit mode

Essentially, you need to know if both variants are on the same allele/chromosome or not. This is called phasing variants. That's trivial if you have reads spanning from one position to the other.

My organism is human.

Please state that from the beginning when asking questions. Try to be as informative as possible.

ADD REPLYlink
0
Entering edit mode

Can you please explain about phasing variants a bit more? Also, what do you mean about "trivial if you have reads spanning from one position to the other". I am really new to variant data.

Sorry, I forget to explain the organism, I will add that.

Anyway, currently I am thinking to generate all possible combination of ploid because I think it is not that hard and maybe there will be not that many variant in 1 transcript. What do you think about that?

ADD REPLYlink
0
Entering edit mode

Since your organism is diploid, for a given combination of two SNPs there are two possible scenarios. Either the SNPs are from the same chromosome/allele, or from different chromosomes.

Scenario 1:
maternal: ACCTGACGAAACTAACGGAACC
paternal: ACATGACGATACTAACGGAACC

Scenario 2:
maternal: ACCTGACGATACTAACGGAACC
paternal: ACATGACGAAACTAACGGAACC

Obviously, both scenarios seriously influence how the obtained protein will be affected!

Phasing variants means figuring out which of those scenario's you have, and when you have reads spanning from SNP1 to SNP2 that's quite trivial, because then you can see if the variants are always in the same read or never (and therefore you know if they originate from the same molecule/chromosome).

ADD REPLYlink
0
Entering edit mode

Ok. So, after I do some simulation, the number of scenario will increase with the number of heterozygous variation in a region. For 4 heterezygous variation, I will have 4 scenario. So, do you know how to figure out which scenario is the best? Maybe any tools or software you know? Thank you.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1