stringtie merged gtf doesn't give any gene expression columns
1
0
Entering edit mode
5.9 years ago
Biologist ▴ 290

Hi,

I'm using hisat2 for aligning reads to genome. stringtie for quantification steps. Used stringtie --merge for assembling all samples gtf. The stringtie_merged.gtf doesn't have any FPKM, TPM or coverage columns which are present in sample gtf files. Is there a way to get all those columns with stringtie --merge?

And why some transcripts which are present in sample gtf files missing in stringtie_merged.gtf file?

Thank you !!

RNA-Seq stringtie hisat2 merge • 3.5k views
ADD COMMENT
0
Entering edit mode
5.9 years ago

Please take a look at my previous answer here: transcript count after string tie merge

Both my answer and the one on the cross-posting on SeqAnswers independently corroborate each other.

In addition, the reason why some of your transcripts are missing in the merged GTF is likely because they fail one of the filter criteria for the merge process. Please refer to the helpful StringTie manual, to which I link in my answer.

Kevin

ADD COMMENT
0
Entering edit mode

For Stringtie --merge this is what I see.

Transcript merge usage mode: 
  stringtie --merge [Options] { gtf_list | strg1.gtf ...}
With this option StringTie will assemble transcripts from multiple
input files generating a unified non-redundant set of isoforms. In this mode
the following options are available:
  -G <guide_gff>   reference annotation to include in the merging (GTF/GFF3)
  -o <out_gtf>     output file name for the merged transcripts GTF
                    (default: stdout)
  -m <min_len>     minimum input transcript length to include in the merge
                    (default: 50)
  -c <min_cov>     minimum input transcript coverage to include in the merge
                    (default: 0)
  -F <min_fpkm>    minimum input transcript FPKM to include in the merge
                    (default: 1.0)
  -T <min_tpm>     minimum input transcript TPM to include in the merge
                    (default: 1.0)
  -f <min_iso>     minimum isoform fraction (default: 0.01)
  -g <gap_len>     gap between transcripts to merge together (default: 250)
  -i               keep merged transcripts with retained introns; by default
                   these are not kept unless there is strong evidence for them
  -l <label>       name prefix for output transcripts (default: MSTRG)
ADD REPLY
1
Entering edit mode

Yes, that is also what I saw when I looked. One or more of these parameters is causing your individual samples' transcripts to be excluded from the merged GTF. I am not to know which one (or more than one) of these is affecting your particular data. You should become your own investigator and begin to explore your own data and by modifying these parameters.

Also, as per my other answer and the answer on SeqAnswers, once you obtain your merged GTF, you then re-run StringTie with the merged GTF for the purposes of obtaining the read count abundances

ADD REPLY
0
Entering edit mode

Thank you very much

ADD REPLY
0
Entering edit mode

Okay - best of luck with it. I would start by looking a the transcripts that were excluded and to see how they could meet any one of these exclusion criteria. If a transcript is so lowly expressed, it may actually just be transcriptional 'noise' and be virtually functionless.

ADD REPLY

Login before adding your answer.

Traffic: 2599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6