Convert from SDF to FASTA
1
0
Entering edit mode
7 months ago
tpritsky • 0

Hi, I am trying to use the METLIN small molecule dataset for ML, which has small molecules as SDF files. However, my downstream model requires input as a sequence. Is there any way to go from SDF to sequence files (eg. like a FASTA file)? Thanks for the help!

Dataset: https://figshare.com/articles/dataset/The_METLIN_small_molecule_dataset_for_machine_learning-based_retention_time_prediction/8038913

SDF molecule fasta ML small • 573 views
ADD COMMENT
0
Entering edit mode
7 months ago

SDF files contain the protein structures, not the sequences themselves.

What you could do is convert the SDF into PDB format using any of the many available converters (maybe via Open Babel?), then chuck those PDB files into Foldseek. Then pull out the sequences of the top hits.

Edit:

Looking at the data in the SDF file more, I don't think what you're trying to do is possible. The SDF file in your figshare link contains PUBCHEM_COMPOUND_IDs and you can pull out the molecules from there, here's the link for the first compound: https://pubchem.ncbi.nlm.nih.gov/compound/5139 That's not a protein nor a gene, that's C3H8N2S.

ADD COMMENT

Login before adding your answer.

Traffic: 1687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6