Entering edit mode
6.2 years ago
furgfurg
▴
10
The following python code will add ANY residue which has an atom with the name ‘CA’ to the “protein”, even if the atom is within a ligand. How do you change it to see if the residue name is in a list of standard residue names? How do you change the code to only get protein info?
for res in residues:
atom_names = []
atom_index = []
for atom in res.atoms():
atom_names.appendatom.name)
atom_index.append(atom.index)
if 'CA' in atom_names:
protein_atoms = protein_atoms+atom_index
I'm just trying to figure out how to change it to differentiate between a ligand and a protein. I probably should've added the line
residues = structure.topology.residues()
above. But yes, the pdb I'm eventually going to want to use has a ligand and a protein (and water).You can do all that in Chimera if you just need a quick solution. Also the python source code of Chimera is here:
https://www.cgl.ucsf.edu/chimera/docs/sourcecode.html so you could just look for the function that does Select -> Residues -> Standard Amino Acids , possibly with some overhead attached.
I'm not familiar with chimera, but I'll check it out. Thank you.
This looks like a post directly from an exam situation, so don't expect an immediate response ;) However, that is an interesting one regarding downloading of sequences from PDB, so we could keep it, just wait a few days with providing the answer.
I'm not sure what the end goal is here? Do you just want a list of all the protein residues, or are you looking to do something with the atoms specifically?
I am assuming the question is lacking context. To add to the context, I would say that given a PDB file or output of the PDB API as input, extract only standard AA residues (this becomes more complex if the input is a protein complex), excluding the ligand, and write the result to a FASTA file or a PDB file that does only contain the protein atoms but not ligands. I was searching biostars for this relatively simple use case, but couldn't find it.
Example structure http://www.rcsb.org/structure/1DLH : how would you get only the sequence of the MHC without the bound peptide also? Or http://www.rcsb.org/structure/1Y1Y : how to get only the protein sequence without the bound RNA?
The way it is now is, it looks for residues with a 'CA' atom name to identify protein residues. However, this will give back the protein and ligand residues due to the 'CA' search. How could it be changed so that ONLY protein residues get added to the list.