Question

stringtie merged gtf doesn't give any gene expression columns

0

Entering edit mode

5.9 years ago

Biologist ▴ 290

Hi,

I'm using hisat2 for aligning reads to genome. stringtie for quantification steps. Used stringtie --merge for assembling all samples gtf. The stringtie_merged.gtf doesn't have any FPKM, TPM or coverage columns which are present in sample gtf files. Is there a way to get all those columns with stringtie --merge?

And why some transcripts which are present in sample gtf files missing in stringtie_merged.gtf file?

Thank you !!

RNA-Seq stringtie hisat2 merge • 3.5k views

ADD COMMENT • link updated 5.9 years ago by Kevin Blighe 87k • written 5.9 years ago by Biologist ▴ 290

score 0 · Answer 1 · 2018-05-22

0

Entering edit mode

5.9 years ago

Kevin Blighe 87k

Please take a look at my previous answer here: transcript count after string tie merge

Both my answer and the one on the cross-posting on SeqAnswers independently corroborate each other.

In addition, the reason why some of your transcripts are missing in the merged GTF is likely because they fail one of the filter criteria for the merge process. Please refer to the helpful StringTie manual, to which I link in my answer.

Kevin

ADD COMMENT • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

For Stringtie --merge this is what I see.

Transcript merge usage mode: 
  stringtie --merge [Options] { gtf_list | strg1.gtf ...}
With this option StringTie will assemble transcripts from multiple
input files generating a unified non-redundant set of isoforms. In this mode
the following options are available:
  -G <guide_gff>   reference annotation to include in the merging (GTF/GFF3)
  -o <out_gtf>     output file name for the merged transcripts GTF
                    (default: stdout)
  -m <min_len>     minimum input transcript length to include in the merge
                    (default: 50)
  -c <min_cov>     minimum input transcript coverage to include in the merge
                    (default: 0)
  -F <min_fpkm>    minimum input transcript FPKM to include in the merge
                    (default: 1.0)
  -T <min_tpm>     minimum input transcript TPM to include in the merge
                    (default: 1.0)
  -f <min_iso>     minimum isoform fraction (default: 0.01)
  -g <gap_len>     gap between transcripts to merge together (default: 250)
  -i               keep merged transcripts with retained introns; by default
                   these are not kept unless there is strong evidence for them
  -l <label>       name prefix for output transcripts (default: MSTRG)

ADD REPLY • link 5.9 years ago by Biologist ▴ 290

1

Entering edit mode

Yes, that is also what I saw when I looked. One or more of these parameters is causing your individual samples' transcripts to be excluded from the merged GTF. I am not to know which one (or more than one) of these is affecting your particular data. You should become your own investigator and begin to explore your own data and by modifying these parameters.

Also, as per my other answer and the answer on SeqAnswers, once you obtain your merged GTF, you then re-run StringTie with the merged GTF for the purposes of obtaining the read count abundances

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you very much

ADD REPLY • link 5.9 years ago by Biologist ▴ 290

0

Entering edit mode

Okay - best of luck with it. I would start by looking a the transcripts that were excluded and to see how they could meet any one of these exclusion criteria. If a transcript is so lowly expressed, it may actually just be transcriptional 'noise' and be virtually functionless.

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k