Question

Problem with Trinity's abundance_estimates_to_matrix.pl

0

Entering edit mode

6.2 years ago

ando.kelli ▴ 60

Hi all,

I'm after a bit of help. I was trying to create a combined matrix for my .genes.results files with the following script that comes in the Trinity suite (version 2.5.1):

~/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/abundance_estimates_to_matrix.pl \
  --gene_trans_map ~/Pearl/Master_Assembly/P_maxima_master_assembly.fasta.gene_trans_map \
  --est_method RSEM --out_prefix Pearl_Subset ./*genes.results

The output was:

* Outputting combined matrix.

Use of uninitialized value within %column_header_to_filename in hash element at /home/kanders2/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/abundance_estimates_to_matrix.pl line 319.
Error, no TPM value specified for transcript [TRINITY_DN0_c0_g1_i1] of gene [TRINITY_DN0_c0_g1] for sample 1A_T_RSEM at /home/kanders2/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/abundance_estimates_to_matrix.pl line 320.

The following matrix files were created:

/home/kanders2/Pearl/Reads/DE/RSEM_output_SUBSET/Pearl_Subset.gene.counts.matrix (this file was empty except for the header sample names)

/home/kanders2/Pearl/Reads/DE/RSEM_output_SUBSET/Pearl_Subset.isoform.counts.matrix (this file contained data)

Any ideas?

I'm not really sure why it's creating a isoform.results file... In the older versions of the script you could just select the .genes files as input which excludes the .isoforms. Is that not possible now? I can't tell from the documentation on GitHub.

To confirm, the --gene_trans_map file is the one that's created during the Bowtie2/RSEM step earlier (script below)

~/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/align_and_estimate_abundance.pl \
  --transcripts ~/Pearl/Master_Assembly/P_maxima_master_assembly.fasta --seqType fq \
  --est_method RSEM --aln_method bowtie2 --trinity_mode --thread_count 32 \
  --output_dir 1A_T_RSEM_output --prep_reference  \
  --left ~/Pearl/Reads/Reads_for_exp_DELETE_when_finished/bbduk_clean_1A_T_CAVHHANXX_GATCAG_L002_R1_val_1.fq.gz \
  --right ~/Pearl/Reads/Reads_for_exp_DELETE_when_finished/bbduk_clean_1A_T_CAVHHANXX_GATCAG_L002_R2_val_2.fq.gz

Cheers, Kelli Anderson

Trinity RNA-Seq RSEM Matrix • 4.4k views

ADD COMMENT • link 6.2 years ago by ando.kelli ▴ 60

score 0 · Answer 1 · 2018-01-29

0

Entering edit mode

6.2 years ago

h.mon 35k

Maybe the error is due to --out_prefix Pearl_Subset ./*genes.results, you should not use an asterisk here. Try instead with --out_prefix Pearl_Subset ./genes.results.

ADD COMMENT • link 6.2 years ago by h.mon 35k

0

Entering edit mode

Hi h.mon,

The * is essential as there are multiple input files with the .genes.results suffix and a different prefix/sample name.

ADD REPLY • link 6.2 years ago by ando.kelli ▴ 60

0

Entering edit mode

From the wiki page:

--output_prefix <string>         prefix for output files.  Defaults to --est_method setting.

You are misunderstanding globbing in general, and the meaning of --out_prefix parameter in Trinity. Globbing expands wild-cards to carry out filename expansion - that is, it matches several files following a pattern.

The --out_prefix whatever Trinity parameter will create the names of output files, prepending "whatever" to the several output files created by the script.

ADD REPLY • link 6.2 years ago by h.mon 35k

0

Entering edit mode

The --out_prefix flag doesn't exist anymore for the latest release when using align_and_estimate_abundance.pl which is the one that creates the input files for abundance_estimates_to_matrix.pl. Instead, it creates individual sub-directories for each sample that include the sample name in the directory name based on --output_dir.

I renamed the output files from align_and_estimate_abundance.pl to include the samples names, thinking that I could have my .genes.results output files in the same folder and run abundance_estimates_to_matrix.pl like I did for the old version (in v2.4.0). However, this approach (using the *) doesn't work because of the way the new release (v2.5.1) works.

I just downloaded V2.4.0 then it worked... It's just way easier for my pipeline...

You're right. I probably missed the boat in terms of understanding how certain parts of the new version work. The Wiki is usually good, but in this instance it didn't help me to understand what the changes meant in terms of what I needed to do so that I can still work with all of my results files in a single directory as opposed in spread out in different directories. We have a lot of samples coming in at different times, and need to look at different combinations of samples, so just being able to add subset of files to single directory and go from there using the * to select all like in v2.4.0 is a simple approach...

Thanks for your help!

ADD REPLY • link 6.2 years ago by ando.kelli ▴ 60

score 0 · Answer 2 · 2018-01-29

In the latest version of Trinity, the program works on directories that are named as opposed to the output files like in v2.4.0.

The abundance_estimates_to_matrix.pl script uses the isoform estimates to compute the gene level estimates based on the gene-trans-map info. This is also different compared to v2.4.0.

Hope that helps anyone else who (like me) has issues switching to the newer release :-)