Hey,
I am looking at the Kai Ye paper on Pindel: http://bioinformatics.oxfordjournals.org/content/25/21/2865.full.pdf+html
and am not sure about some of what the algorithm is actually doing. Specifically, the numbered bullet points in 2.3 when looking for large deletions:
(1) Read in the location and the direction of the mapped read from the mapping result obtained in the preprocessing step;
(2) Define 3ʹ end of the mapped read as the anchor point;
(3) Use pattern growth algorithm to search for minimum and maximum unique substrings from the 3ʹ end of the unmapped read within the range of two times of the insert size from the anchor point;
(4) Use pattern growth to search for minimum and maximum unique substrings from the 5ʹ end of the unmapped read within the range of read length + Max_D_Size starting from the already mapped 3ʹ end of the unmapped read obtained in step 3;
(5) Check whether a complete unmapped read can be reconstructed combining the unique substrings from 5ʹ and 3ʹ ends found in steps 3 and 4. If yes, store it in the database U. Note that exact matches and complete reconstruction of the unmapped read are required so that neither gap nor substitution is allowed.
Initially, I am not sure about the geometry of (3). Searching for substrings from the 3ʹ end of the read in the range of 2* insert size from the anchor point.
Specifically?
How does one search for substrings from the 3ʹ end of a read - surely this is the end of the sequence? (or does it mean searching backwards - I find the english hard to understand)
It seems as though the insert size is the average insert size of insertions, but it is not clear that this is what was meant.
Does anyone have any intuition on this paper / the method used?
Cheers!
Hello richard.brown!
It appears that your post has been cross-posted to another site: SeqAnswers.
This is typically not recommended as it runs the risk of annoying people in both communities.
Ok, I will take one down. Do you know which community is more active/relevant? Cheers
For this question, probably this one, though this may be one of those questions where it's quicker to just ask the tool's author.