Question

Build protein-protein interaction prediction deep learning model

0

Entering edit mode

2.8 years ago

Anh ▴ 20

0

I'm an undergraduate biology student and my thesis is on designing a deep learning architecture to predict whether two proteins interact or not given their primary sequences. I have read some papers with different approaches to the problem including the paper https://academic.oup.com/bioinformatics/article/35/14/i305/5529260 uses PIPR model which is the state of the art model for this problem in 2019 and recently in 2021 the paper https://www.biorxiv.org/content/10.1101/2021.01.22.427866v1 uses D-SCRIPT which surpass PIPR with the aspect of the small dataset and has better generalization feature for many model organisms.

For my problem, I have two questions

Can you suggest to me the approach to build a model for my thesis. Should I follow these papers to build a model, how my model will be different from the models in the papers?
Which architecture or model recently that the literature considers the most proper for the protein-protein interaction prediction approach? I very much appreciate any help from everyone!

interaction machine learning deep protein-protein • 687 views

ADD COMMENT • link updated 2.8 years ago by Mensur Dlakic ★ 27k • written 2.8 years ago by Anh ▴ 20

score 3 · Answer 1 · 2021-08-20

Most projects require guidance, resources and a know-how. Forgive my bluntness, but you seem to be short on all of them. If you still want to go through with it, I think you should at the very least use similar datasets as in those papers, as that will cut down on time needed to lift the project off the ground. I don't think the goal should be to make a better model than they did, because that is unrealistic. As long as you can make some kind of a model that works and learn in the process, I think that should be more than enough. My general suggestion would be to scale down your approach to your resources and your timeline.

Is this an honors thesis? Most undergraduates do not write a thesis, and even when they do it is considerably less ambitious. This is something where a full-time researcher could spend 6-12 months. Broadly speaking, one needs to read the literature and understand the problem. Next is how has that problem been solved before, and how will you solve it. Then you may want to create a gold-standard dataset of known non- and interacting partners. Those sequences will need to be manipulated, which often involves long searches against large database. When you have all the data, the model must be trained and validated. Finally, it has to be written up in a required format. Be sure that you are up for this when it comes to time and resources, or you may spend long hours on it and not have much to show.

I will get you started by giving links to several recent papers that use different approaches from what you listed.