How to make different protein sequence length equal to the max length
0
0
Entering edit mode
5.0 years ago
Rana • 0

I want to generate protein descriptors but for some of them I need equal protein lengths. So is there any way to make all sequence length equal to get numerical descriptors. I tried padding to make sequence length equal but it doesn't work.

Any help will be appreciated.

sequence • 802 views
ADD COMMENT
0
Entering edit mode

This question obviously needs more information. What is a "descriptor" in this context? Which file format are you working in? Which format is your input data? Padding with what? What doesn't work if you use padding?

Why do you think it makes biological sense to "change the length of a protein"? Do you know what you are doing?

ADD REPLY
0
Entering edit mode

Descriptors: Structural and physio chemical descriptors extracted from protein sequences used to represent protein sequences and predict structural, functional, expression and interaction profiles of proteins and peptides for eg. amino acid composition, solvent accessibility etc. I am using online tools (ifeature)to generate these descriptors and the input data are as raw sequences as well as fasta format. In order to generate solvent accessibility descriptor or to generate position specific scoring matrix I need equal lengths of all protein sequence which varies from 20 to 1200 amino acids. Basically I need all protein sequence length equal to 1200. So, to make sequence length equal I used padding with Zero. I did it manually to check whether it works or not. I am not sure it is the right way or not that why I am looking for some guidance. However, I did padding to generate the protein descriptor which I can get only when the sequence length is equal. If my method is wrong can you suggest me some possible way to do. Thank you

ADD REPLY
1
Entering edit mode

Basically I need all protein sequence length equal to 1200.

Biology does not work like that. A protein with 1200 AA is huge.

Have you considered lowering that number and then selecting proteins that fit a range (e.g. between 150-200 AA) from pool of curated ones from say SwissProt?

The average size of a protein increases from Archaea to Bacteria to Eukaryote (283, 311, 438 residues and 31, 34, 49 kDa respecitvely) due to a bigger number of protein domains constituting proteins in higher organisms. For instance, yeast proteins are on average 466 amino acids long and 53 kDa in mass.

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6