How To Extract Just The Coordinate Values From A Pdb File Converted To A Text File, In Java Only?
6
3
Entering edit mode
13.8 years ago
Jeremiahloh ▴ 30

ATOM 1 N ASN A 2 18.668 27.299 52.379 1.00 41.19 N

ATOM 2 CA ASN A 2 19.400 26.674 53.492 1.00 40.18 C

ATOM 3 C ASN A 2 19.710 27.737 54.550 1.00 37.56 C

ATOM 4 O ASN A 2 19.123 27.737 55.640 1.00 38.90 O

ATOM 5 N LEU A 3 20.637 28.606 54.184 1.00 34.40 N

Those in bold are the coordinates i need to extract and in the form of (x,y,z) down the list.

Would greatly appreciate your help.

From my research it seems that i can't directly extract columns but i have to do a parsing and a split token. Could someone justify this?

coordinates java pdb parsing structure • 9.9k views
ADD COMMENT
5
Entering edit mode
13.8 years ago

Have you looked at BioJava for reading / parsing PDB files?

ADD COMMENT
3
Entering edit mode
13.8 years ago

I'm not a PDB guru, but if your file record is just the set of line you showed, then I would use the following trivial program:

(...)
Pattern delim=Pattern.compile("[\t]");//is it a tab or a space ?
String line;
while((line=bufferedReader.readLine())!=null)
  {
  String tokens[]=delim.split(line);
  double x= Double.parseDouble(tokens[6]);
  double y= Double.parseDouble(tokens[7]);
  double z= Double.parseDouble(tokens[8]);
  (...)
  }
(....)

if your PDB file is more complex than your snippet then, as said Khader, have a look at Biojava or at JavaCC .

ADD COMMENT
4
Entering edit mode

Pierre, PDB files are generally parsed using column numbers. Please check ATOM records for a detailed description.

ADD REPLY
0
Entering edit mode

I would like to add that PDB files might look simple and thus it is tempting to write your own little parser like that above. In reality there are many subtle issues in the parsing that are best left to mature libraries to handle. Thus I'd recommend the use of Biojava too, see the tutorial

ADD REPLY
2
Entering edit mode
13.2 years ago

Jmol and the CDK have PDB readers that allow you to do this too. A Groovy script (using Java classes) for the CDK could look like:

import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.io.*;
import org.openscience.cdk.tools.manipulator.*;
import org.openscience.cdk.io.IChemObjectReader.Mode;
import org.openscience.cdk.*;
import java.io.File;
import java.util.zip.GZIPInputStream;

reader = new PDBReader(
  new GZIPInputStream(
    new URL("http://www.pdb.org/pdb/files/1CRN.pdb.gz").openStream()
  )
);
crambin = reader.read(new ChemFile());
for (container in ChemFileManipulator.getAllAtomContainers(crambin)) {
  for (atom in container.atoms()) {
    println atom.point3d;
  }
}
ADD COMMENT
2
Entering edit mode
13.1 years ago
Abirami ▴ 30

How to extract the coordinates of an atom from a pdb file in c

char *substring(size_t start, size_t stop, const char *src, char *dst, size_t size)
{
    int count = stop - start;
    if ( count >= --size )
    {
        count = size;
    }

    sprintf(dst, "%.*s", count, src + start);
    return dst;
}

int main(void)
{
    const char filename[] = "cys_coord.txt";
    char x[10],y[10],z[10];
    int i,j;
    char buffer[500], *ptr;
    FILE *file = fopen(filename, "r");

    if ( file )
    {
        for ( i = 0; fgets(buffer, sizeof buffer, file); ++i )
        {
            printf("%s\n",buffer);  
            printf("x = %s\n", substring(30, 8, buffer, x, sizeof x));
            printf("y = %s\n", substring(38, 8, buffer, y, sizeof y));
            printf("z = %s\n", substring(46, 8, buffer, z, sizeof z)); 
        }
    }
    fclose(file);
}
ADD COMMENT
0
Entering edit mode

Thanks for trying to help, but he was asking for a Java only solution.. (Although I wasn't the one who gave you the downvote)

ADD REPLY
0
Entering edit mode
13.8 years ago
Jeremiahloh ▴ 30

Hey there,

I came up with this but I think I am making a mess of all the information or methods and classes. Could anybody help me to straighten my thoughts? Pleasseee... and THank you!

import java.util.*;  
import java.io.*;
import java.util.regex.Pattern; import
java.io.StreamTokenizer;

public class CoorToks {

    public StringTokenizer(String token); //invalid method declaration
    public static void main(String[] args) throws IOException {
        BufferedReader inputStream = null; // scan input line by line
        PrintWriter outputStream = null;// output aligned the same way
        Pattern delim=Pattern.compile("/s");

        String token;
        StringTokenizer tokenizer = new StringTokenizer(token);

        try {
            inputStream = new BufferedReader(new FileReader("1APB.pdb.txt"));
            outputStream = new PrintWriter(new FileWriter("characteroutput.txt"));
            while(tokenizer.hasMoreTokens())
            {
                if (token.trim().startsWith("ATOM") && !token.trim().endsWith("H")) // I need to scan for the word "ATOM" before i start tokenizing. ends at H.
                {
                    // and i only need the 7th to 9th tokens of each line.
                    // should i use a pattern delimiter instead?
                    String tokens[]=delim.split(token);
                    double x= Double.parseDouble(tokens[7]);
                    double y= Double.parseDouble(tokens[8]);
                    double z= Double.parseDouble(tokens[9]);
                    outputStream.println(token);

                    //the compiler says it can't find variable tokens. which means i have to do a declaration of variables?
                    // how do i do that when there are so many tokens coming from the text file.
                }
            }
        }//end of try

        finally {
            while ((token = inputStream.readLine()) != null)
            {
                outputStream.println(token);
            }
            if (inputStream != null) {
                inputStream.close();
            }
            if (outputStream != null) {
                outputStream.close();
            }
        }
    }
}
ADD COMMENT
0
Entering edit mode

This code will not extract 3D coordinate for hetero atoms, but maybe that's intentional?

ADD REPLY
0
Entering edit mode
String[] tokens = delim.split(token);

However, I would recommend either a) using a third-party library, as PDB files are tricky or b) splitting (as you say) on columns. Do this with substring + a copy of the PDB specification :)

ADD REPLY
0
Entering edit mode
13.1 years ago
Jordeu ▴ 20

Do it object oriented!

You can use BioJava, check this two links:

If you are using java this is the best option.

ADD COMMENT

Login before adding your answer.

Traffic: 1546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6