Issue with PDB+chain to Pfam mapping and domain filter
0
0
Entering edit mode
3.0 years ago
jmungar2 ▴ 10

Hello,

I have 2 dataframes.

Dataframe 1 is a list of pdb chains presenting a particular substructure, and looks like : df1 = PDBCH RESIDUE_1 RESIDUE_N SUBSTRUCTURE_SEQUENCE

Dataframe 2 is the pdb2pfam mapping file from here http://ftp.ebi.ac.uk/pub/databases/Pfam/mappings/, and looks like: df2 = PDBCH PDB_START PDB_END PFAM_ACCESSION PFAM_NAME

where PDBCH means PDB code + Chain, so 5 character entries.

To map the PDBCH entries in my df1 to df2 and thus get the Pfam families for each of my results in df1 I do this:

df1_to_pfam_list = []

for index, value in enumerate (df1.PDBCH): pfam_indexes_list = df2.index[df2['PDBCH'] == value].tolist() df3 = pdb2pfam.iloc[pfam_indexes_list, :] df1_to_pfam_list.append(df3)

df1_to_pfam_df = pd.concat(df1_to_pfam_list)

Thus, df1_to_pfam_df looks like df2 but following the order of df1 and containing the indexes of df2. this is:

Index_df2 PDBCH(df1 order) PDB_START PDB_END PFAM_ACCESSION PFAM_NAME

Now I need to merge this new dataframe (df1_to_pfam_df) to df1 so that I can check if the sequence RESIDUE_1 TO RESIDUE_N in df1 are inside or not the Pfam domains (PDB_START TO PDB_END entries in df2). The problem is that df1_to_pfam_df is different in size that df1 because some pdbch entries are mapping to more than 1 Pfam family.

I'm quite stuck at this point. Any suggestions?

Thank you Juan

Pfam mapping PDB python • 503 views
ADD COMMENT

Login before adding your answer.

Traffic: 2230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6