I have looked through similar posts with the same warning:
WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.
The indexes were built using the same -G file so the naming conventions should be exactly the same. An ERCC control has been included in the dataset but the same error occurs when the control sequences are not included.
The reference.gtf looks how it should but I'm concerned perhaps the geneID column (9th)?
scaffold1 WormBase_imported exon 7437 7876 .
+ . transcript_id "transcript:BN1106_s1B000532.mRNA-1"; gene_id "gene:BN1106_s1B000532"; gene_name "BN1106_s1B000532";
Has anyone else seen something similar?
Could there be a problem with the sort and convert step from sam to bam files? Should I be using -n option and sorting by read name? I'm using the command below which sort by leftmost coordinate by default as that's what the protocols paper used.
samtools sort -@ 8 -o sample.bam sample.sam
Thank you in advance for any help :-)
p.s. I don't think my script has any problem but here's a sample:
stringtie -p 8 -G genome/genome_ERCC92.gtf -o sample.gtf sample.bam
A better title for your post would be "Stringtie GTF naming convention error", which is succinct and conveys the gist of your question. Details are better suited for the actual body of the post.
Changed. Thank you :)
How exactly the ERCC was included? Was it included into the reference genome and annotation prior to building the index?
I used the commands
these outputs were used to build the index used for alignments. I have since run the pipeline with the basic genome files (not including ERCC seqs) and I get the same problem.
Could there be a problem with the sort and convert step from sam to bam files? Should I be using -n option and sorting by read name? I'm using the command below which sort by leftmost coordinate by default as that's what the protocols paper used.
Any advice appreciated.