Question

Features of Protein Fuctions

0

Entering edit mode

6.6 years ago

bzamith26 ▴ 10

Hello!

I need to extract features of protein functions. They are organized in a hierarchy, so one way that I thought about solving this issue was representing each node of this hierarchy as a vector containing its path from root. Something like this: https://imgur.com/a/gmvTv

I would like to know if anyone knows another way of extracting features of protein functions, hopefully something more related to biology, but I accept any suggestion. Thank you really much!

protein function protein classification features • 1.7k views

ADD COMMENT • link 6.6 years ago by bzamith26 ▴ 10

1

Entering edit mode

What kind of functions are you interested in ? Is it Gene Ontology biological process annotations ? Are you trying to derive feature vectors representing protein functions ? What are you trying to achieve with these features ?

ADD REPLY • link 6.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hi Jean! Thank you for your reply. I want to use machine learning to classify protein functions, but making use of interaction data... So I would need both proteins and protein functions described as a vector of features (which I only have for proteins). I want to use Gene Ontology database and FunCat as well, both hierarchical.

ADD REPLY • link 6.6 years ago by bzamith26 ▴ 10

1

Entering edit mode

It's still not entirely clear how you plan on using the data. Do you want to use GO and FunCat as input or for validation ? What are the interaction data you want to use ? Regardless, consider that not all machine learning algorithms require a vector representation. For example, many algorithms can make use of kernels (e.g. support vector machines) and computing kernels doesn't always require vectors. For examples of kernels derived from a variety of data types (including GO annotations), look at this paper of mine and at this tutorial.

ADD REPLY • link 6.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

[..................]

ADD REPLY • link 6.6 years ago by bzamith26 ▴ 10

0

Entering edit mode

I don't know the predictive bi-clustering tree algorithm, could you share a reference ? The problem with feature-based representations is to find features that are relevant to the problem at hand but also contain useful information. In the case of GO, you could simply create a binary vector representing all functions you care about. As for interaction data, you could use the rows of the graph adjacency matrix as vectors.

ADD REPLY • link 6.6 years ago by Jean-Karim Heriche 27k

1

Entering edit mode

Here and here you have good references about PCTs (Predictive Clustering Trees). Bi-Predictive Clustering Trees are a new idea, and I know a few papers but they are under revision. Once they get published, I will update this!

"As for interaction data, you could use the rows of the graph adjacency matrix as vectors." = Great suggestion! I'll definitely consider that. Thanks!