Biostar Beta. Not for public use.
Calculate the coverage of a protein having a list of its peptides
0
Entering edit mode
10 months ago
arronar • 200
Austria

Hello out there.

I was wondering if there is a simple way using R to calculate the coverage of a protein when you have a list of peptides from it and its initial sequence.

For example let's say that we have this protein sequence taken from uniprot:

MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL

and we have a list of some of its peptides that may or may not overlap one an other.

pepts = c("DRRRRMEALLLSLY", "YPNDRKLL", "DYKEWSPPRVQVECPKAPVEWNNPPS
    EKGLIVGHFSGIKYKGEKAQA", "SEVDVNK", "MCCWVSKFKDAMRRYQGIQ", "TCKIPGK", "VLSDLD
    AKIKAYNLTVEGVEGFVRYSRVTK", "DRRRRMEALLLSLYYPNDRKLL" , "SEVDVNKMCCWVSKFK")

Can we somehow to calculate the coverage ?

Thank you.

ADD COMMENTlink
0
Entering edit mode

While this is not a R solution, have you thought of doing multiple-sequence alignment?

ADD REPLYlink
0
Entering edit mode

I tried clustal omega but I don't know how to get its results inside R and also it doesn't seem to return a percentage of coverage.

ADD REPLYlink
0
Entering edit mode

Not my field of work, however I found 2 solutions looking in google. Not tested my end. Try and see if it fits yours.

For MS data : isobar R package does the work, check the pdf

I also found this tool Protein Coverage Summarizer but it's not an R package

ADD REPLYlink
0
Entering edit mode

Thank you but none of them seem to can help me.

ADD REPLYlink
3
Entering edit mode
9 months ago
EMBL Heidelberg, Germany

Just use regular expressions to match the peptides to the protein sequence and record an X at each matched position. When all of the peptides have been processed, count the Xs.

ADD COMMENTlink
0
Entering edit mode

Just what I would do. I don't think, there is simpler solution.

ADD REPLYlink
0
Entering edit mode

I guess that I have to count both the starting and ending position of each match and then sum them up because some of them may be overlap each other.

ADD REPLYlink
2
Entering edit mode

No need to sum anything. Here is a perl way of doing it:

my $cover_seq = $protein_seq; # copy in which we're going to replace matches by X
foreach my $peptide_seq(@peptides) {
    if ($protein_seq=~/$peptide_seq/) { # peptide matches the protein
        my $start = $-[0]; # start position of match
        my $end = $+[0]; # end position of match
        my $len = $end - $start; # length of the match
        # Replace peptide by Xs in protein sequence
        substr($cover_seq, $start, $len) = 'X' x $len;
   }
}
# Count number of Xs to get coverage
my $coverage = ($cover_seq=~tr/X//)/length($cover_seq) * 100;
ADD REPLYlink
0
Entering edit mode

Oh I see. Thank you.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1