Biostar Beta. Not for public use.
Putting atomic coordinates from PDB file into Pandas dataframe?
0
Entering edit mode
17 months ago

Greetings all.

I have a list of atomic coordinates from a PDB file saved to variable x. This is a short sample of what I get in the interpreter when I write print(x) in my code

[22.732 33.537 34.278]
[20.362 36.096 32.786]
[20.421 34.188 29.509]
[18.039 31.768 31.227]
[16.639 33.68  34.216]
[14.774 36.97  34.169]
[15.869 37.132 37.823]
[18.284 34.705 39.471]
[16.077 34.65  42.582]
[13.807 32.393 40.54 ]
[16.256 29.54  41.111]
[18.689 30.829 43.723]
[16.129 30.09  46.454]
[14.536 27.024 48.066]
[17.114 24.788 46.348]
[16.391 21.581 48.303]
[13.315 20.955 46.163]
[15.592 20.428 43.156]
[17.535 17.539 44.664]
[16.719 14.029 43.436]
[15.347 12.195 46.47 ]
[16.07   8.681 45.172]
[19.803  9.399 45.021]


What I would like to do is put these values in a dataframe in pandas. To do this, here is the code I have written

import pandas as pd
for chains in structure:
for chain in chains:
for residue in chain:
for atom in residue:
x = atom.get_coord()

sample = pd.DataFrame({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})
print(sample)


When this code runs, it outputs the following

           X      Y       Z
0  19.802999  9.399  45.021


For some reason, it only puts the final item in x into the dataframe. I am not sure how to put ever element in x into the dataframe. Does anyone know how to go about doing this?

3
Entering edit mode
7 weeks ago
RamRS 21k
Houston, TX

This is how loops work - they perform a single task until a condition is satisfied. This single task in your case is assigning atom.get_coord() to x. Since each pass in the loop only assigns to x and you don't use x until the loop is complete, you only see the last value of x.

Try:

import pandas as pd

arr_x = [];

for chains in structure:
for chain in chains:
for residue in chain:
for atom in residue:
x = atom.get_coord()
arr_x.append({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})

sample = pd.DataFrame(arr_x)
print(sample)