Biostar Beta. Not for public use.
Find mapping of indices of amino acid index in PDB files and sequence
0
Entering edit mode
23 months ago
JJP • 0

Hi All,

I am a beginner in Biopython. What I am trying to do is the following:

I have a sequence of amino acids (including gaps)and a corresponding PDB file. The numbering of amino acids in the PDB file does not match the numbering of the amino acids in the sequence list. I want to find the index of each amino acid entries in the PDB file and find the corresponding number in the sequence. For example, if the first entry in the PDB file is Alanine, I want to find the corresponding index of Alaline in the sequence list. Also, for gaps (-), I want to set the index as zero.

Here is the sequence list I have:

-LLPYFDF----DVPRNLTVTVGQT-GFLHCRVERLGDK-----DVSWIRKR----------DLHILTAGGTTYTSDQRFQVLRP---------------------------------------DGSANWTLQIKYPQPRDSGVYECQINTEP-KMSLSYTFNVVE-IVDPKFSSPIVNMTAPVGRDAFLTCVVQDLGPYKVAWLRVDTQTILTIQNHVITKNQRIGIANSEH---KTWTMRIKDIKESDKGWYMCQINTDPMKSQMGYLDVV----

Here is what I have tried so far:

import pylab as pyl
import numpy as np
import sys
import os
import re
import argparse

def parseArgs():
"""Parse command line arguments"""

try:
   parser = argparse.ArgumentParser(
   description = 'Read and extract items from input PDB file')

parser.add_argument('-i',
                    '--input',
                    action='store',
                    required=True,
                    help='input PDB file in standard format')

 except:
 print ("An exception occurred with argument parsing. Check your provided options.")
 traceback.print_exc()

 return parser.parse_args()

 # Reads a PDB file and returns the residue name and coordinates for 
 # each C-alpha atom
 # (the input argument for this routine is the pdb file name.)

def get_coordinates_PDB(File_In):
  try:
      fl = open(File_In,'r')
 except:
  print('Could not open input file {0}'.format(File_In))
  sys.exit()
  Res = []
  Points = []

 #Getting from a PDB file

for line in fl:
  if not(line.startswith('ATOM')):
    continue
elif (line[13:15] != 'CA'):
    continue
resname = line[17:20]
xyz = re.findall('[-+]?\d+\.\d+', line)
tmp = np.zeros(3)
Res.append(resname)
tmp[0] = float(xyz[0])
tmp[1] = float(xyz[1])
tmp[2] = float(xyz[2])
Points.append(tmp)
fl.close()
return Points, Res


def main():
 """Read and parse a provided PDB file."""


#Parse arguments
 args = parseArgs()

 File_In = args.input

print(get_coordinates_PDB(File_In))

if __name__ == '__main__':
    main()

This outputs the x,y,z coordinates and the amino acids in the PDB file. However, I am stalled at this point.

I would much appreciate if someone could help me with implementing the rest. Thank you in advance for your time and help!

ADD COMMENTlink
0
Entering edit mode

There was a post several weeks ago. It may be useful to you.

Using STDIN with BioPython's PDB methods

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1