Creating a distance matrix from PDB file coordinates (Python)
1
0
Entering edit mode
5.4 years ago

Hello all

I have a list of xyz coordinates of different points from a PDB file assigned to variable x. Here is a snippet of what it looks like

[ 8.721 15.393 22.939]
[11.2   13.355 25.025]
[11.045 15.057 28.419]
[13.356 13.814 31.169]
[12.54  13.525 34.854]
[14.038 15.691 37.608]
[16.184 12.782 38.807]
[17.496 12.053 35.319]
[18.375 15.721 34.871]
[20.066 15.836 38.288]
[22.355 12.978 37.249]
[22.959 14.307 33.724]
[24.016 17.834 34.691]
[26.63  16.577 37.161]
[29.536 18.241 35.342]
[27.953 21.667 35.829]

I would like to use these points to compute a distance matrix. I have tried to use the SciPy distance_matrix function, however it does not appear to support xyz coordinates, only x and y coordinates. Is there a quick way to compute this distance matrix manually?

pdb python • 4.0k views
ADD COMMENT
1
Entering edit mode

You could do this in R with package rpdb much more easily. Also, I would recommend to keep the atom names attached.

ADD REPLY
0
Entering edit mode

I agree with Michael that its probably best to keep it ‘intact’ with atom names etc, but what you’re looking for is the 3D Euclidean distance.

Take a look at: https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy

ADD REPLY
2
Entering edit mode
5.4 years ago
Joe 21k

Going off of my earlier comment, I think this is one way to do it, though I havent checked to ensure the maths is correct. The numbers look OK though I think.

I had to slightly reformat your input data to be a little easier to use though:

Input Data:

8.721 15.393 22.939
11.2 13.355 25.025
11.045 15.057 28.419
13.356 13.814 31.169
12.54 13.525 34.854
14.038 15.691 37.608
16.184 12.782 38.807
17.496 12.053 35.319
18.375 15.721 34.871
20.066 15.836 38.288
22.355 12.978 37.249
22.959 14.307 33.724
24.016 17.834 34.691
26.63 16.577 37.161
29.536 18.241 35.342
27.953 21.667 35.829

Code:

import numpy
import sys
import itertools

with open(sys.argv[1], 'r') as ifh:
    coords = [numpy.array(line.rstrip('\n').split(' ')) for line in ifh]

fl_coords = [arr.astype(float) for arr in coords]

for each in itertools.combinations(fl_coords,2):
    print('{} - {} = {}'.format(each[0],each[1], numpy.linalg.norm(each[0]-each[1])))

Some example output:

[  8.721  15.393  22.939] - [ 11.2    13.355  25.025] = 3.8275685493534914
[  8.721  15.393  22.939] - [ 11.045  15.057  28.419] = 5.961901710025082
[  8.721  15.393  22.939] - [ 13.356  13.814  31.169] = 9.576500717903174
[  8.721  15.393  22.939] - [ 12.54   13.525  34.854] = 12.650747408750203
[  8.721  15.393  22.939] - [ 14.038  15.691  37.608] = 15.60573144713185
[  8.721  15.393  22.939] - [ 16.184  12.782  38.807] = 17.728708751626556
[  8.721  15.393  22.939] - [ 17.496  12.053  35.319] = 15.537716209276061
[  8.721  15.393  22.939] - [ 18.375  15.721  34.871] = 15.351870374648167
[  8.721  15.393  22.939] - [ 20.066  15.836  38.288] = 19.091806488648473

This will give you everything in 'long' format for all the pairwise point distances, I'm not sure if you wanted all-vs-all or just all-vs-one, but it shouldnt take much to manipulate this in to a matrix now.

ADD COMMENT
0
Entering edit mode

Thank you for the explanation. For the record, I was looking for a method to do all vs. all

ADD REPLY
0
Entering edit mode

You’re in luck then, that’s what this will do :)

ADD REPLY

Login before adding your answer.

Traffic: 3156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6