Question

Derive Consensus Read from pairwise2 alignment

0

Entering edit mode

7.1 years ago

L. A. Liggett ▴ 120

I am trying to use pairwise2 from Biopython to align two reads and derive a consensus read. But I can't figure out how to do this. pairwise2 works well for me when doing something like this:

alignments = pairwise2.align.globalxx("ACCGT", "ACG")
print(format_alignment(*alignments[0]))

But then outputs the following string:

ACCGT
|||||
A-CG-
  Score=3

So I am looking for direction on how to proceed from here to derive a consensus read.

alignment • 2.2k views

ADD COMMENT • link updated 6.7 years ago by Markus ▴ 320 • written 7.1 years ago by L. A. Liggett ▴ 120

0

Entering edit mode

Based on your short example, what should the consensus look like?

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

It could be a lot of different things like ACCGT or ANCGN. I just don't know the best place to turn to now start deriving a consensus sequence.

ADD REPLY • link 7.1 years ago by L. A. Liggett ▴ 120

0

Entering edit mode

How can you define a consensus from just 2 sequences? With only 2 sequences you don't know which one is 'right' , so can't possibly pick a consensus base.

Do you just mean you want to know the positions where the base is ambiguous?

ADD REPLY • link 6.7 years ago by Joe 21k

score 0 · Answer 1 · 2017-08-03

You need to iterate through both aligned sequences (given in alignments[0][0] and alignments[0][1]) and compare both letters.

E.g. like this:

from Bio import pairwise2 as pw

alignments = pw.align.globalxx('ACCGT', 'ACG')

consensus = []

for a, b in zip(alignments[0][0],alignments[0][1]):
    if a == b:
        consensus.append(a)
    elif a == '-':
        consensus.append(b)
    elif b == '-':
        consensus.append(a)
    elif a in ('G', 'C') and b in ('G', 'C'):
        consensus.append('S')  # for strong
    # etc, etc

print('seq1 ' + alignments[0][0])
print('seq2 ' + alignments[0][1])
print('cons ' + ''.join(consensus))

Output:

seq1 ACCGT
seq2 A-CG-
cons ACCGT

Instead of running through numerous elif it may be more elegant to use a dictionary and construct keys from your two letters:

from Bio import pairwise2 as pw

alignments = pw.align.globalxx('ACCGT', 'ACG')

consensus = []

cons_dic = {'CG': 'S', 'AT': 'W', 'AC': 'M', 'AG': 'R'}  # etc, etc
for a, b in zip(alignments[0][0],alignments[0][1]):
    if a == b:
        consensus.append(a)
    elif a == '-':
        consensus.append(b)
    elif b == '-':
        consensus.append(a)
    else:
        key = ''.join(sorted(a + b))
        consensus.append(cons_dic[key])

print('seq1 ' + alignments[0][0])
print('seq2 ' + alignments[0][1])
print('cons ' + ''.join(consensus))