Hi Biostars,
I align RNAseq reads using STAR and my annotation are in gff format.
My STAR command is
STAR --runThreadN 6 --genomeDir dir --sjdbGTFfile gff --sjdbGTFtagExonParentTranscript Parent --readFilesIn reads_1 read2_2 --readFilesCommand zcat --outSAMtype BAM Unsorted --outReadsUnmapped Fastx --quantMode TranscriptomeSAM GeneCounts --outTmpDir ~/TMP/TMPs
Gff file looks like
NC_018292.1 RefSeq gene 610 1857 . - . ID=gene0;Dbxref=GeneID:14536854;Name=CORT_0A00100;end_range=1857,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CORT_0A00100;partial=true;start_range=.,610
NC_018292.1 RefSeq mRNA 610 1857 . - . ID=rna0;Parent=gene0;Dbxref=Genbank:XM_003865799.1,GeneID:14536854;Name=XM_003865799.1;end_range=1857,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,610;transcript_id=XM_003865799.1
NC_018292.1 RefSeq exon 610 1857 . - . ID=id47;Parent=rna0;Dbxref=Genbank:XM_003865799.1,GeneID:14536854;end_range=1857,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,610;transcript_id=XM_003865799.1
NC_018292.1 RefSeq CDS 610 1857 . - 0 ID=cds0;Parent=rna0;Dbxref=GOA:H8WZ57,InterPro:IPR002938,InterPro:IPR003042,UniProtKB/TrEMBL:H8WZ57,Genbank:XP_003865847.1,GeneID:14536854;Name=XP_003865847.1;gbkey=CDS;product=hypothetical protein;protein_id=XP_003865847.1;transl_table=12
NC_018292.1 RefSeq gene 2012 4213 . - . ID=gene1;Dbxref=GeneID:14536804;Name=CORT_0A00110;end_range=4213,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CORT_0A00110;partial=true;start_range=.,2012
NC_018292.1 RefSeq mRNA 2012 4213 . - . ID=rna1;Parent=gene1;Dbxref=Genbank:XM_003865800.1,GeneID:14536804;Name=XM_003865800.1;end_range=4213,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,2012;transcript_id=XM_003865800.1
NC_018292.1 RefSeq exon 2012 4213 . - . ID=id48;Parent=rna1;Dbxref=Genbank:XM_003865800.1,GeneID:14536804;end_range=4213,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,2012;transcript_id=XM_003865800.1
NC_018292.1 RefSeq CDS 2012 4213 . - 0 ID=cds1;Parent=rna1;Dbxref=InterPro:IPR001611,InterPro:IPR025875,UniProtKB/TrEMBL:H8WZ58,Genbank:XP_003865848.1,GeneID:14536804;Name=XP_003865848.1;gbkey=CDS;product=hypothetical protein;protein_id=XP_003865848.1;transl_table=12
Mapping rate is high, ca 92% of unique alignments.
But ReadsPerGene.out.tab looks like
N_unmapped 1239857 1239857 1239857
N_multimapping 24667 24667 24667
N_noFeature 342031 16030329 520585
N_ambiguous 0 0 0
When I convert gff file to gtf using gffread, everything works fine.
So I am wondering what might be the problem, because this option --sjdbGTFtagExonParentTranscript Parent
is supposed to make STAR work with gff file normally.
Cheers,
How is your
ReadsPerGene.out.tab
using the gtf? I may be missing something, but I didn't see anything wrong with gff alignment.So this is with gff
This is with gtf
Ok, now I understand. I never used STAR with a gff annotation, I don't know if this is a bug or not, maybe it is worth opening a ticket on STAR github?
Anyway, you have a simple fix to the issue already.
Can you check by removing the option
--sjdbGTFfile gff
?This option defines the path of GTF file, and since there is no gtf-file in the current dir named "gff", STAR might be not be doing anything.
If I don't provide the gff file STAR will not count anything, obviously because it does not have info about features. Also this is just a sample command, in real life I specify the path to gff.