I am looking for the best approach to assigning read counts to de novo assembly contigs in order to analyze expression levels via RPKM. In doing this work I have thus far followed the Simple Fool's Guide gene expression analysis tutorial but this entails an approach where only reads that map uniquely are counted. My understanding is that some programs will assign reads based on an educated guess as to where a read may be assigned if it maps to multiple contigs -- and that these guesses may only be marginally better than a random assignment.
So my question is, what tools do people in this forum favor for assigning read counts to contigs for a de novo transcriptome assembly? Has unique read mapping been superseded by newer approaches? Also, my best mapping results have been generated with BWA-mem (approximately 83% paired end read mapping) so I am inclined to use a tool like DEseq, but at this point do not fully understand how it assigns reads.
Thanks in advance for any input.
Hi thank you for the heads up on HTseq-count but I don't think I can use it as it requires a GFF file (identifying exon position), and my assembly is strictly de novo (no reference transcriptome) so I am only talking about mapping to my contigs and quanitfying read counts from there. My understanding is that cufflinks/cuffdiff also requires ref transcriptome (at least I think so). Finally RSEM only seems to handle bowtie mapped reads.
Creating the GFF/GTF file for htseq-count would be pretty trivial since each contig is a feature. Cufflinks doesn't require an annotation. Rsem does use bowtie by default, but you can also feed it a SAM/BAM file from another aligner.
I just wanted to note that while what you say here about RSEM is correct -- namely can give it SAM/BAM files from another aligner, however RSEM explcitly states that it does not accept gapped alignments: "please note that RSEM does * not * support gapped alignments. So make sure that your aligner does not produce alignments with intersions/deletions." So that kills using BWA with RSEM.
Ah, I hadn't noticed that, thanks for the update!
hmmm tried to post an image below but perhaps I am not clever enough. cufflinks has been great. Generated GTF file that has FPKMs and coverage, loaded this along with assembly and BAM file to IGV... it's excellent.
Cool, glad that worked for you.
Ah thanks, will proceed most likely with cufflinks/cuffdiff. would prefer program that handles ambiguous reads as I have already quantified reads that map uniquely using SFG approach (referenced above)