Warning encountered while transcript abundance estimation using stringtie
1
0
Entering edit mode
7.0 years ago

I am trying to run stringtie for transcript abundance estimation using below command:

for i in *.sorted.bam; do sample_name=`echo $i| awk -F "." '{print $1}'`; stringtie -e -B -p 60 -G stringtie-merged.gtf -o $sample_name.gtf $i;done

Warning encountered:

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.

The assembly was done using below commands:

#stringtie assembly

for i in *sorted.bam; do

sample_name=`echo $i| awk -F "." '{print $1}'`

date && time stringtie -p 60 -G reference.gff3 -o $sample_name.gtf $i -A $sample_name.gene.adundance -C $sample_name.fullycovered 

done

#merging assemblies
date && time stringtie --merge -p 60 -G reference.gff3 -o stringtie-merged.gtf merge_list.txt

where merge_list.txt contains the list of the gtf files produces as an output from stringtie asembly.

stringtie RNA-Seq abundance • 6.7k views
ADD COMMENT
0
Entering edit mode

Have you checked your gff3 file looks valid? Also was this the reference.gff3 used in your original alignment?

ADD REPLY
0
Entering edit mode

Have you checked your gff3 file looks valid?

gffread -E reference.gff3

GFF: discarding unrecognized feature "chromosome" ID=chromosome:1
GFF: discarding unrecognized feature "chromosome" ID=chromosome:10
GFF: discarding unrecognized feature "chromosome" ID=chromosome:10_random
GFF: discarding unrecognized feature "chromosome" ID=chromosome:11
GFF: discarding unrecognized feature "chromosome" ID=chromosome:11_random

Additionally, no problem found with the stringtie-merged.gtf file

Also was this the reference.gff3 used in your original alignment?

Alignment was ran using hisat2 without using the gff file.

ADD REPLY
0
Entering edit mode

Are the entries in the gff file synonymous with the annotation of the fasta files you used to build the HISAT index?

ADD REPLY
0
Entering edit mode

Yes, absolutely! I confirmed the same by fetching the sequence from the reference fasta file from the gf file using bedtools getfasta

ADD REPLY
0
Entering edit mode

Are your GFF3 transcript lines noted as mRNA or as transcript?

ADD REPLY
0
Entering edit mode

"transcript"

#!genome-build-accession GCA_000003745.2
1       Genoscope       chromosome      1       23037639        .       .       .       ID=chromosome:1;Alias=FN597015.1
###
1       iggp    gene    10731   27597   .       +       .       ID=gene:VIT_01s0011g00010;biotype=protein_coding;description=Putative uncharacterized protein  [Source:UniProtKB/TrEMBL%3BAcc:F6HF07];gene_id=VIT_01s0011g00010;logic_name=iggp12x.v1
1       iggp    mRNA    10731   27597   .       +       .       ID=transcript:VIT_01s0011g00010.t01;Parent=gene:VIT_01s0011g00010;biotype=protein_coding;transcript_id=VIT_01s0011g00010.t01
1       iggp    exon    10731   10945   .       +       .       Parent=transcript:VIT_01s0011g00010.t01;Name=VIT_01s0011g00010.t01.exon1;constitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=VIT_01s0011g00010.t01.exon1;rank=1
1       iggp    five_prime_UTR  10731   10945   .       +       .       Parent=transcript:VIT_01s0011g00010.t01
1       iggp    five_prime_UTR  12798   12836   .       +       .       Parent=transcript:VIT_01s0011g00010.t01
1       iggp    exon    12798   12859   .       +       .       Parent=transcript:VIT_01s0011g00010.t01;Name=VIT_01s0011g00010.t01.exon2;constitutive=1;ensembl_end_phase=2;ensembl_phase=-1;exon_id=VIT_01s0011g00010.t01.exon2;rank=2
1       iggp    CDS     12837   12859   .       +       0       ID=CDS:VIT_01s0011g00010.t01;Parent=transcript:VIT_01s0011g00010.t01;protein_id=VIT_01s0011g00010.t01
ADD REPLY
0
Entering edit mode

They're actually noted as mRNA:

1       iggp    mRNA    10731   27597 ....

In the 9th field you have:

ID=transcript:VIT_01s0011g00010.t01;Parent=gene:VIT_01s0011g00010;biotype=protein_coding;transcript_id=VIT_01s0011g00010.t01

I don't think that the transcript names are correct in the ID field. They're like transcript:VIT_01s0011g00010.t01 but I assume they should be only VIT_01s0011g00010.t01, because the ":" could disrupt the program's analysis. The same can be said for the gene.

Basically, what you have in the Transcript_ID field should be in the ID field, because that is what these programs read.

ADD REPLY
0
Entering edit mode

Should I then give it a try by removing the ":" ? Or the whole thing along with the colon sign?

ADD REPLY
0
Entering edit mode

I suggest you edit genes and transcripts in their ID and Parent fields removing the gene: or the transcript: part, like:

ID=gene:VIT_01s0011g00010
ID=transcript:VIT_01s0011g00010.t01;Parent=gene:VIT_01s0011g00010

Become:

ID=VIT_01s0011g00010
ID=VIT_01s0011g00010.t01;Parent=VIT_01s0011g00010
ADD REPLY
0
Entering edit mode

Looking at the code here, the error is seen when:

if (guided && no_ref_used) {
        GMessage("WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!\n"
                "Please make sure the -G annotation file uses the same naming convention for the genome sequences.\n");
 }

where guided

if (args.getOpt('G')) {
       guidegff=args.getOpt('G');
       if (fileExists(guidegff.chars())>1)
            guided=true;
       else GError("Error: reference annotation file (%s) not found.\n",
                 guidegff.chars());
}

where no_ref_used

if (gseq_id>=refseqCount) {
                     if (verbose)
                         GMessage("WARNING: no reference transcripts found for genomic sequence \"%s\"! (mismatched reference names?)\n",
                                refseqName);
                 }
                 else no_ref_used=false;
ADD REPLY
0
Entering edit mode
6.3 years ago
geo.pertea ▴ 130

The (belated) answer to the original problem reported here can be found in this post: A: stringtie cannot recognize Gencode GTF

ADD COMMENT

Login before adding your answer.

Traffic: 1849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6