Biostar Beta. Not for public use.
Find consensus sequence of several DNA sequences
0
Entering edit mode
18 months ago
Bella_p • 50

Hi!

I have a list of around 200 different DNA sequences, each ~150 bp long, and I'd like to find a consensus sequence for all of them. I'm sure there is probably a function that does that which I'm not familiar with. Does anyone know which package/function to use to do that? I prefer in python, but R is also OK....

Thanks!

ADD COMMENTlink
0
Entering edit mode

Hwave you tried a multiple sequence alignment?

Any of these tools should provide you with a consensus sequenc:

https://www.ebi.ac.uk/Tools/msa/

ADD REPLYlink
2
Entering edit mode
20 months ago
st.ph.n ♦ 2.5k
Philadelphia, PA

You can use Biopython to create a consensus sequence.

#!/usr/bin/env python

import sys
from Bio import AlignIO
from Bio.Align import AlignInfo

alignment = AlignIO.read(sys.argv[1], 'fasta')
summary_align = AlignInfo.SummaryInfo(alignment)
summary_align.dumb_consensus(float(sys.argv[2]))

Save as consensus.py, run as python consensus.py input.fasta x, where x is the percentage of sequences to call a position in the consensus sequence; i.e. python consensus.py input.fasta 0.5 would mean that a residue or nucleotide would have to be represented in 50% of the sequences to call that position.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1