Sequence alignment with biopython
1
0
Entering edit mode
5.0 years ago
alec_djinn ▴ 380

I am trying to get sequence alignments with biopython but I am not getting what I think should be the correct result.

I know there are countless ways to compute alignments, can someone suggest me any tool (using biopyhon or other python libs would be preferred) that could give me the expected result?

Here an example:

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

r = 'ATGGAGAAAAAAATCACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTT'
c = 'ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT'
alignment = pairwise2.align.globalxx(r, c)
print(format_alignment(*alignment[0]))

ATGGAGAAAA-AAATCACTGGATATACCACCGTTGATA----TATCC-CAATGGCATCGTAAAGAACATTT
|||||| ||| |||||||||||||||||||||||||||    ||| | |||||||||||||||||||||||
ATGGAG-AAATAAATCACTGGATATACCACCGTTGATAAAAATAT-CGCAATGGCATCGTAAAGAACATTT
  Score=63

and here what I would like the result to be:

ATGGAGAAAAAAATCACTGGATATACCACCGTTGATA----TATCCCAATGGCATCGTAAAGAACATTT
|||||||||*|||||||||||||||||||||||||||    ||||*|||||||||||||||||||||||
ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT
  Score=??
sequence alignment • 2.8k views
ADD COMMENT
0
Entering edit mode

What version of BioPython are you using? Pairwise2 was rewritten in later versions and now provides much more realistic results without as many spurious gaps.

ADD REPLY
0
Entering edit mode

I am using Biopython version 1.72, it should be the last one available in conda.

ADD REPLY
0
Entering edit mode

In that case, Bastien's answer is probably your best bet.

ADD REPLY
2
Entering edit mode
5.0 years ago

You can increase the gap penalty values, open and extend ones

alignment = pairwise2.align.globalms(r, c, 2,-1,-1,-0.5)
print(format_alignment(*alignment[0]))

ATGGAGAAAAAAATCACTGGATATACCACCGTTGAT----ATATCCCAATGGCATCGTAAAGAACATTT
|||||||||.||||||||||||||||||||||||||    |||||.|||||||||||||||||||||||
ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT
   Score=121.5

Identical characters are given 2 points, 1 point is deducted for each non-identical character, 1 point is deducted when opening a gap, and 0.5 points are deducted when extending it.

See the docs

ADD COMMENT

Login before adding your answer.

Traffic: 3231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6