Seeking Input On Assigning Read Counts To Contigs From De Novo Transcriptome Assembly
2
2
Entering edit mode
10.6 years ago
lzwright ▴ 150

I am looking for the best approach to assigning read counts to de novo assembly contigs in order to analyze expression levels via RPKM. In doing this work I have thus far followed the Simple Fool's Guide gene expression analysis tutorial but this entails an approach where only reads that map uniquely are counted. My understanding is that some programs will assign reads based on an educated guess as to where a read may be assigned if it maps to multiple contigs -- and that these guesses may only be marginally better than a random assignment.

So my question is, what tools do people in this forum favor for assigning read counts to contigs for a de novo transcriptome assembly? Has unique read mapping been superseded by newer approaches? Also, my best mapping results have been generated with BWA-mem (approximately 83% paired end read mapping) so I am inclined to use a tool like DEseq, but at this point do not fully understand how it assigns reads.

Thanks in advance for any input.

mapping • 4.3k views
ADD COMMENT
2
Entering edit mode
10.6 years ago

If you eventually want to use DESeq, you'll want to use HTSeq-count, which only uses uniquely aligned reads/read pairs. Since you're using bwa-mem, keep in mind that you may also be producing chimeric/fusion alignments, which will produce a warning (and possibly aberrant behavior) with HTSeq-count.

Alternatives would be to use cufflinks/cuffdiff or rsem/EBseq. With the cufflinks/cuffdiff, you just use that package, which is geared toward handling ambiguously aligned reads. Rsem directly handles multimappers as well, though I have no experience with EBseq.

ADD COMMENT
0
Entering edit mode

Hi thank you for the heads up on HTseq-count but I don't think I can use it as it requires a GFF file (identifying exon position), and my assembly is strictly de novo (no reference transcriptome) so I am only talking about mapping to my contigs and quanitfying read counts from there. My understanding is that cufflinks/cuffdiff also requires ref transcriptome (at least I think so). Finally RSEM only seems to handle bowtie mapped reads.

ADD REPLY
0
Entering edit mode

Creating the GFF/GTF file for htseq-count would be pretty trivial since each contig is a feature. Cufflinks doesn't require an annotation. Rsem does use bowtie by default, but you can also feed it a SAM/BAM file from another aligner.

ADD REPLY
1
Entering edit mode

I just wanted to note that while what you say here about RSEM is correct -- namely can give it SAM/BAM files from another aligner, however RSEM explcitly states that it does not accept gapped alignments: "please note that RSEM does * not * support gapped alignments. So make sure that your aligner does not produce alignments with intersions/deletions." So that kills using BWA with RSEM.

ADD REPLY
0
Entering edit mode

Ah, I hadn't noticed that, thanks for the update!

ADD REPLY
0
Entering edit mode

hmmm tried to post an image below but perhaps I am not clever enough. cufflinks has been great. Generated GTF file that has FPKMs and coverage, loaded this along with assembly and BAM file to IGV... it's excellent.

ADD REPLY
0
Entering edit mode

Cool, glad that worked for you.

ADD REPLY
0
Entering edit mode

Ah thanks, will proceed most likely with cufflinks/cuffdiff. would prefer program that handles ambiguous reads as I have already quantified reads that map uniquely using SFG approach (referenced above)

ADD REPLY
0
Entering edit mode
10.6 years ago
lzwright ▴ 150

Cufflinks has really done the trick enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 1434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6