Question

B Cell Epitope Prediction: Maximizing Antigenicty and Minimizing Sequence Length

2

Entering edit mode

6.7 years ago

jondriscoll1988 ▴ 20

Hi Everyone,

I have signed up to this forum to ask this question, since I did some searching and did not find an answer; hopefully this is the appropriate forum for such a question. Please excuse my lack of expertise, I am not a bioinformatician by training.

Background:I have 12 amino acid sequences (all variants of the same protein), each ranging from 20-40 residues in length. I need to find a configuration of these sequences that has the highest likelihood of inducing B-cell immunity--basically the highest possible antigenicity score. I will then be cloning that sequence into a viral vector, which adds the practical constraint that the sequence cannot be too long. There are some rules I must follow which reduces the number of possible combinations, but not by too many. My questions are as follows:

1) In your collective expertise, which bioinformatics tools are the best for B-cell epitope prediction? I have found many open source tools which generate antigenicity scores (IEDB Immunogenicty Prediction, SVM Trip, SEPIa). Is there one that is superior? Which parameters are most important for B-cell epitope prediction?

2) What I really want is a script which will try different configurations of the AA sequences until it arrives at maximum antigenicity score/minimum sequence length. Does something like this already exist, or do I have to make it myself? I wrote a function in python which generates all possible combinations. I figure I could write something else which takes each one of those combinations and passes it through whichever prediction tool is best. It seems this might be harder in practice to achieve; I know for example in SEPIa, there is a lot of manual work required (going to different sites, copying and pasting results, etc) and I am no expert programmer so I am not sure how to code for that. Plus, I suspect that the solution to this problem can use any number of the amino acids from any number of the 12 sequences--which means there are millions of potential combinations to score. My sense is that this should probably be something written in parallel and run on a cluster.

Sorry for being verbose, I just wanted to add as much detail as possible. I sincerely appreciate any feedback or thoughts. Have a great day! Jon

B Cell antigenicity Python blast • 1.6k views

ADD COMMENT • link 6.7 years ago by jondriscoll1988 ▴ 20

score 1 · Answer 1 · 2017-08-01

immunogenicity prediction is a very uncharted territory. that is, a potential antigenicity (or peptide recognition by the MHC) still does not tell you whether your lymphocytes have the receptor's repertoire (TCR or BCR) to recognize the peptide. this means that once you move into in vitro and in vivo validations you will find out that a only a small percentage of what you predicted, if any, is actually stimulating a cell mediated response.

that said, IEDB/netMHC suite is considered the standard approach. in order to automate it you can download the IEDB tools and run them locally by piping your python script with IEDB. as far as I know there is no tool that does that for you, so you'll have to do it yourself.

score 0 · Answer 2 · 2017-08-01

Thank you TriS,

Your points about how much is unknown in this domain are duly noted; let's see if I can catch lightning in a bottle with one or more of my matches.

I must check out netMHC, as well as the local IEDB application. Piping my script into one of these locally seems like the best way to not crash any servers (other than my own), so thank you again for your suggestions.

Have a great day!