Question

Downloading sequences from PDB and only get protein residues and not ligand residues?

0

Entering edit mode

6.2 years ago

furgfurg ▴ 10

The following python code will add ANY residue which has an atom with the name ‘CA’ to the “protein”, even if the atom is within a ligand. How do you change it to see if the residue name is in a list of standard residue names? How do you change the code to only get protein info?

for res in residues:
   atom_names = [] 
   atom_index = [] 
   for atom in res.atoms(): 
        atom_names.appendatom.name) 
         atom_index.append(atom.index) 
         if 'CA' in atom_names: 
                protein_atoms = protein_atoms+atom_index

pdb atom residue protein amino acid • 2.3k views

ADD COMMENT • link 6.2 years ago by furgfurg ▴ 10

1

Entering edit mode

I'm just trying to figure out how to change it to differentiate between a ligand and a protein. I probably should've added the line residues = structure.topology.residues() above. But yes, the pdb I'm eventually going to want to use has a ligand and a protein (and water).

ADD REPLY • link 6.2 years ago by furgfurg ▴ 10

0

Entering edit mode

You can do all that in Chimera if you just need a quick solution. Also the python source code of Chimera is here:

https://www.cgl.ucsf.edu/chimera/docs/sourcecode.html so you could just look for the function that does Select -> Residues -> Standard Amino Acids , possibly with some overhead attached.

ADD REPLY • link 6.2 years ago by Michael 54k

0

Entering edit mode

I'm not familiar with chimera, but I'll check it out. Thank you.

ADD REPLY • link 6.2 years ago by furgfurg ▴ 10

0

Entering edit mode

This looks like a post directly from an exam situation, so don't expect an immediate response ;) However, that is an interesting one regarding downloading of sequences from PDB, so we could keep it, just wait a few days with providing the answer.

ADD REPLY • link 6.2 years ago by Michael 54k

0

Entering edit mode

I'm not sure what the end goal is here? Do you just want a list of all the protein residues, or are you looking to do something with the atoms specifically?

ADD REPLY • link 6.2 years ago by Joe 21k

0

Entering edit mode

I am assuming the question is lacking context. To add to the context, I would say that given a PDB file or output of the PDB API as input, extract only standard AA residues (this becomes more complex if the input is a protein complex), excluding the ligand, and write the result to a FASTA file or a PDB file that does only contain the protein atoms but not ligands. I was searching biostars for this relatively simple use case, but couldn't find it.

Example structure http://www.rcsb.org/structure/1DLH : how would you get only the sequence of the MHC without the bound peptide also? Or http://www.rcsb.org/structure/1Y1Y : how to get only the protein sequence without the bound RNA?

ADD REPLY • link 6.2 years ago by Michael 54k

0

Entering edit mode

The way it is now is, it looks for residues with a 'CA' atom name to identify protein residues. However, this will give back the protein and ligand residues due to the 'CA' search. How could it be changed so that ONLY protein residues get added to the list.

ADD REPLY • link 6.2 years ago by furgfurg ▴ 10