Multiple Sequence Alignment In Biopython.
2
1
Entering edit mode
12.5 years ago
Mkl ▴ 100

Hi all,

1) How to compare ASTRAL SCOP genetic domain sequences based on PDB SEQRES and PDB ATOM records?My aim is to find the missing residues from these records. I know that I have to do alignment using these sequences. How can I do alignment and how can I find missing residues using Biopython?

2) ClustalW takes a group of sequences and performs all pairwise alignments. It then calculates a similarity matrix, which it analyzes to see how distantly related the groups of sequences are. How can I perform these steps (pairwise sequence alignment, distance matrix, hierarchial clustering, dendrogram ) in Biopython?

biopython • 11k views
ADD COMMENT
3
Entering edit mode
12.5 years ago

BioPython provides I/O capabilities and handling, not the alignment algorithms itself. Therefore, you have to call an external program, e.g. ClustalW.

A possible workflow would be:

  • use BioPython to read the FASTA sequences from SCOP
  • use BioPython to read the PDB SEQRES/ATOM sequences (described here)
  • align them using ClustalW
  • search for gaps in the Alignment object

All of the steps are described in the BioPython cookbook, which I highly recommend you read.

ADD COMMENT
3
Entering edit mode
12.2 years ago
Jan Kosinski ★ 1.6k

Using clustalw for this particular task (aligning SEQRES and ATOM sequences in order to find missing residues in PDB structure) is wrong.

To find a mapping of SEQRES to ATOM sequences you should rely on information from PDB in mmCIF format - aligning them with clustalw or any other program will not guarantee proper mapping. For example, in this alignment:

GNIKANR
GN----R

the mapping of asparagine (N) flanking the missing residues is ambiguous based on the alignment.

In another example, in my clustalx with default options (Gonnet250, gap open: 10, extend: 0.1) such alignment is optimal:

ANIKANR
A---IKR

whereas it should be:

ANIKANR
A-IK--R

The missing residue information is included in pdbx_poly_seq_scheme field in mmCIF as questions marks . Also, this information is directly accessible through SEQATOMS database

ADD COMMENT
0
Entering edit mode

Thanks so much! I have been searching for how to retrieve the alignments shown under the sequence tab on the PDB website for quite some time. Taking some time again today I finally found the answer :) I even knew it was in the mmCIF after some time but couldn't find where exactly!

ADD REPLY

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6