Warning: no reference transcripts were found when map the reads to reference sequences by using stringtie
0
0
Entering edit mode
6.6 years ago
Yuyin110 ▴ 10

I do the transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown following the step of the paper of "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown", but at the step of mapping the reads to the reference sequences, encountered a warning "no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences." My run code is

for sample_name in $(cat samples.list)
do
stringtie -p 8 -G  sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gtf -o stringtie/$sample_name.gtf  -l $sample_name  hist2/JGI/$sample_name.bam
done

It can get the .gtf file of each sample, the result also can do abundance estimation for Ballgown, but the abundance of all reference transcripts was zero, not zero abundance transcripts were novel transcripts, I think the result was not reliable, maybe the warning information is the problem, my reference sequences were download from NCBI(ftp://ftp.ncbi.nlm.nih.gov/genomes/Sorghum_bicolor/Assembled_chromosomes/seq/), and the annotation also download from NCBI(ftp://ftp.ncbi.nlm.nih.gov/genomes/Sorghum_bicolor/GFF/ref_Sorghum_bicolor_NCBIv3_top_level.gff3.gz), which was transformed to gtf format by the command

gffread  sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gff3 -T -o 
 sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gtf

and I used the gffread to examine the gtf file,

 gffread  -E sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gtf

there are no error, and I am sure my annotation file uses the same naming convention for the genome sequences, but why the warning was encountered, and can not get the reference transcripts abundance.

RNA-Seq stringtie • 4.8k views
ADD COMMENT
0
Entering edit mode

I have the same problem, did you find out how to solve the issue?

ADD REPLY
0
Entering edit mode

I have the same problem,and don't know how to solve it.

ADD REPLY
0
Entering edit mode

There might be some discrepancies between eg the sequence naming used for mapping (in the bam file) and the sequence names as they are present in the gff/gtf files or the specified gtf files does not have the necessary identifiers to use when extracting transcript info (for instance the use of gene_id, name , etc)

ADD REPLY

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6