Derive Consensus Read from pairwise2 alignment
1
0
Entering edit mode
7.1 years ago
L. A. Liggett ▴ 120

I am trying to use pairwise2 from Biopython to align two reads and derive a consensus read. But I can't figure out how to do this. pairwise2 works well for me when doing something like this:

alignments = pairwise2.align.globalxx("ACCGT", "ACG")
print(format_alignment(*alignments[0]))

But then outputs the following string:

ACCGT
|||||
A-CG-
  Score=3

So I am looking for direction on how to proceed from here to derive a consensus read.

alignment • 2.2k views
ADD COMMENT
0
Entering edit mode

Based on your short example, what should the consensus look like?

ADD REPLY
0
Entering edit mode

It could be a lot of different things like ACCGT or ANCGN. I just don't know the best place to turn to now start deriving a consensus sequence.

ADD REPLY
0
Entering edit mode

How can you define a consensus from just 2 sequences? With only 2 sequences you don't know which one is 'right' , so can't possibly pick a consensus base.

Do you just mean you want to know the positions where the base is ambiguous?

ADD REPLY
0
Entering edit mode
6.7 years ago
Markus ▴ 320

You need to iterate through both aligned sequences (given in alignments[0][0] and alignments[0][1]) and compare both letters.

E.g. like this:

from Bio import pairwise2 as pw

alignments = pw.align.globalxx('ACCGT', 'ACG')

consensus = []

for a, b in zip(alignments[0][0],alignments[0][1]):
    if a == b:
        consensus.append(a)
    elif a == '-':
        consensus.append(b)
    elif b == '-':
        consensus.append(a)
    elif a in ('G', 'C') and b in ('G', 'C'):
        consensus.append('S')  # for strong
    # etc, etc

print('seq1 ' + alignments[0][0])
print('seq2 ' + alignments[0][1])
print('cons ' + ''.join(consensus))

Output:

seq1 ACCGT
seq2 A-CG-
cons ACCGT

Instead of running through numerous elif it may be more elegant to use a dictionary and construct keys from your two letters:

from Bio import pairwise2 as pw

alignments = pw.align.globalxx('ACCGT', 'ACG')

consensus = []

cons_dic = {'CG': 'S', 'AT': 'W', 'AC': 'M', 'AG': 'R'}  # etc, etc
for a, b in zip(alignments[0][0],alignments[0][1]):
    if a == b:
        consensus.append(a)
    elif a == '-':
        consensus.append(b)
    elif b == '-':
        consensus.append(a)
    else:
        key = ''.join(sorted(a + b))
        consensus.append(cons_dic[key])

print('seq1 ' + alignments[0][0])
print('seq2 ' + alignments[0][1])
print('cons ' + ''.join(consensus))
ADD COMMENT

Login before adding your answer.

Traffic: 2419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6