Hi all,
I'm after a bit of help. I was trying to create a combined matrix for my .genes.results files with the following script that comes in the Trinity suite (version 2.5.1):
~/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/abundance_estimates_to_matrix.pl \
--gene_trans_map ~/Pearl/Master_Assembly/P_maxima_master_assembly.fasta.gene_trans_map \
--est_method RSEM --out_prefix Pearl_Subset ./*genes.results
The output was:
* Outputting combined matrix.
Use of uninitialized value within %column_header_to_filename in hash element at /home/kanders2/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/abundance_estimates_to_matrix.pl line 319.
Error, no TPM value specified for transcript [TRINITY_DN0_c0_g1_i1] of gene [TRINITY_DN0_c0_g1] for sample 1A_T_RSEM at /home/kanders2/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/abundance_estimates_to_matrix.pl line 320.
The following matrix files were created:
/home/kanders2/Pearl/Reads/DE/RSEM_output_SUBSET/Pearl_Subset.gene.counts.matrix (this file was empty except for the header sample names)
/home/kanders2/Pearl/Reads/DE/RSEM_output_SUBSET/Pearl_Subset.isoform.counts.matrix (this file contained data)
Any ideas?
I'm not really sure why it's creating a isoform.results file... In the older versions of the script you could just select the .genes files as input which excludes the .isoforms. Is that not possible now? I can't tell from the documentation on GitHub.
To confirm, the --gene_trans_map file is the one that's created during the Bowtie2/RSEM step earlier (script below)
~/bin/Kelli_Tools/trinityrnaseq-Trinity-v2.5.1/util/align_and_estimate_abundance.pl \
--transcripts ~/Pearl/Master_Assembly/P_maxima_master_assembly.fasta --seqType fq \
--est_method RSEM --aln_method bowtie2 --trinity_mode --thread_count 32 \
--output_dir 1A_T_RSEM_output --prep_reference \
--left ~/Pearl/Reads/Reads_for_exp_DELETE_when_finished/bbduk_clean_1A_T_CAVHHANXX_GATCAG_L002_R1_val_1.fq.gz \
--right ~/Pearl/Reads/Reads_for_exp_DELETE_when_finished/bbduk_clean_1A_T_CAVHHANXX_GATCAG_L002_R2_val_2.fq.gz
Cheers, Kelli Anderson
Hi h.mon,
The * is essential as there are multiple input files with the .genes.results suffix and a different prefix/sample name.
From the wiki page:
You are misunderstanding globbing in general, and the meaning of
--out_prefix
parameter in Trinity. Globbing expands wild-cards to carry out filename expansion - that is, it matches several files following a pattern.The
--out_prefix whatever
Trinity parameter will create the names of output files, prepending "whatever" to the several output files created by the script.The --out_prefix flag doesn't exist anymore for the latest release when using align_and_estimate_abundance.pl which is the one that creates the input files for abundance_estimates_to_matrix.pl. Instead, it creates individual sub-directories for each sample that include the sample name in the directory name based on --output_dir.
I renamed the output files from align_and_estimate_abundance.pl to include the samples names, thinking that I could have my .genes.results output files in the same folder and run abundance_estimates_to_matrix.pl like I did for the old version (in v2.4.0). However, this approach (using the *) doesn't work because of the way the new release (v2.5.1) works.
I just downloaded V2.4.0 then it worked... It's just way easier for my pipeline...
You're right. I probably missed the boat in terms of understanding how certain parts of the new version work. The Wiki is usually good, but in this instance it didn't help me to understand what the changes meant in terms of what I needed to do so that I can still work with all of my results files in a single directory as opposed in spread out in different directories. We have a lot of samples coming in at different times, and need to look at different combinations of samples, so just being able to add subset of files to single directory and go from there using the * to select all like in v2.4.0 is a simple approach...
Thanks for your help!