I have an RNA-seq assembly and the corresponding reads form a non-model organism. Those reads map (using RSEM with bowtie via a Trinity wrapper) well against the assembly generally and transcripts of special interest.
However, the subset of transcripts in question that I am very interested in are somewhat fragmented, so I have tried to remove those from the assembly and instead put in PCR clones of the CDS:s from another individual (4-5% difference between individuals on the nucleotide level). However, when I do this, the mapping becomes substantially worse, perhaps 1/100th or 1/1000th of the FPKM compared with before.
I have tested and refuted the following hypotheses:
mapping stringency: tested with a previous version of the software that was more lenient and also tried to reduce seed length to 20 from 25 (both with 2 mismatches).
3'-bias: inspected alignments in IGV, found some 3'-bias, but nowhere near enough to be an important factor.
Paired-end issues: mapped just the left reads as singles.
Server issues: re-ran it on another server, no change.
Made sure reads were sorted.
Command issues: re-mapped against the original assembly with the contigs but without the clones and gotten back the same substantial result, so the command used is not erroneous.
Hidden characters: inspected with vim, found no hidden characters that could explain it.
I initially suspected that the large individual variation would be the clear cause, but even when I use more lenient mapping parameters (allowing 2 mismatches for a 20 nucleotide seed), this does not seem to help.