Question

Please adivse me on differential expression analysis on the STAR/Stringtie output

0

Entering edit mode

7.2 years ago

seta ★ 1.9k

Hi all,

I about follow the “HISAT, StringTie and Ballgown” pipeline for RNA-seq analysis, but I used STAR (instead of HISAT) for mapping reads on the genome followed by Stringtie for genome-guided assembly. As you know “Ballgown” take the FPKM value (here, from stringtie) for doing differential expression analysis. But, for using DEseq or edgeR , we need raw count. As I know, the popular program for generating raw count are HTseq and RSEM, which HTseq is designed to work at the gene level (not transcript level) and RSEM accept the mapping file generated by aligning to transcriptome not genome. Could you please let me know how I should create raw count from bam file produced by STAR for further processing by edgeR analysis at the transcript level?

Thanks

differential expression STAR stringtie count edgeR • 5.1k views

ADD COMMENT • link updated 6.1 years ago by Biostar 20 • written 7.2 years ago by seta ★ 1.9k

0

Entering edit mode

String-Tie has a built-in script to address this issue <prepde.py>. It can be easy to overlook, so here's a direct link to their instructions: http://www.ccb.jhu.edu/software/stringtie/index.shtml?t=manual#deseq

ADD REPLY • link 7.1 years ago by dunhamcg ▴ 20

score 2 · Answer 1 · 2017-03-14

2

Entering edit mode

7.1 years ago

dunhamcg ▴ 20

String-Tie has a built-in script to address this issue 'prepDE.py'. It can be easy to overlook, so here's a direct link to their instructions: http://www.ccb.jhu.edu/software/stringtie/index.shtml?t=manual#deseq

ADD COMMENT • link 7.1 years ago by dunhamcg ▴ 20

0

Entering edit mode

Is 'prepDE.py' reliable? I haven't still come across any paper using this.

ADD REPLY • link 6.0 years ago by Arindam Ghosh ▴ 510

score 0 · Answer 2 · 2017-02-08

0

Entering edit mode

7.2 years ago

Sej Modha 5.3k

You can use featureCounts from the subread package to calculate raw counts from STAR alignments.

ADD COMMENT • link 7.2 years ago by Sej Modha 5.3k

0

Entering edit mode

Thanks, just one thing. Please kindly tell me if the featureCount give the count per both gene and transcript?

ADD REPLY • link 7.2 years ago by seta ★ 1.9k

0

Entering edit mode

You can define the feature type of interest using -t parameter.

 -t <string>         Specify feature type in GTF annotation. `exon' by
                      default. Features used for read counting will be
                      extracted from annotation using the provided value.

For more info: http://bioinf.wehi.edu.au/featureCounts/

ADD REPLY • link 7.2 years ago by Sej Modha 5.3k

0

Entering edit mode

Hi Sej

Thank you. For making sure, the count read per transcript is needed for doing differential expression analysis at the transcript level, yes?, Based on the manual, in default, featureCount give us the count per gene (-t exon -g gene_id), so for counting per transcript I just put -t transcript -g transcript_id, yes, is it right?

ADD REPLY • link 7.2 years ago by seta ★ 1.9k