Question

Counting No. of Reads per gene from RNA-Seq with STAR

0

Entering edit mode

5.0 years ago

skevalkumar • 0

I am trying to get no. of reads per gene with STAR --quantMode.

I have followed the

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir /media/rm313/SDC/genomeDir \
  --genomeFastaFiles Brassica_napus_v4.1.chromosomes.fa --sjdbGTFfile Brassica_napus.annotation_v5.gtf \
  --genomeChrBinNbits 18 --limitGenomeGenerateRAM 910000000000

step to generate Genome.

To get no. of reads, i used

STAR --runThreadN 16  --runMode alignReads --outFilterMultimapNmax 1 \
  --outFilterMatchNmin 35 --outSAMtype SAM --quantMode GeneCounts \
  --twopassMode Basic --outFileNamePrefix "Wes1" --genomeDir /media/rm313/SDC/genomeDir \
  --sjdbGTFfeatureExon /media/rm313/SDC/Brassica_napus.annotation_v5.cds.fa \
  --readFilesIn /media/rm313/SDC/W1F.fastq /media/rm313/SDC/W1R.fastq

My questions

1) ReadsperGene.out.tab does not contain No. of reads per gene. it shows

N_unmapped    453074  453074  453074
N_multimapping    258436  258436  258436
N_noFeature   738252  812120  865586
N_ambiguous   0   0   0

2) I need reads per gene. however, it shows on chromosome name in Log.out and SJ.out.tab file.

Can you please suggest, what's wrong with these commands?

Thank you

RNA-Seq STAR • 4.8k views

ADD COMMENT • link updated 5.0 years ago by h.mon 35k • written 5.0 years ago by skevalkumar • 0

1

Entering edit mode

Your original mapping command had a space between the dashes and the option name (-- twopassMode Basic), but I believe your actual command doesn't have this space, otherwise STAR would error out.

ADD REPLY • link 5.0 years ago by h.mon 35k

1

Entering edit mode

after you run STAR and you do an ls -t command, which files do you see being generated and what's their content?

ADD REPLY • link 5.0 years ago by Friederike 8.9k

0

Entering edit mode

In the second STAR step, it generated Aligned.out.sam, Aligned.out.sam, Log.final.out, Log.progress.out, Log.out, ReadsPerGene.out.tab (already mentioned), SJ.out.tab

ADD REPLY • link 5.0 years ago by skevalkumar • 0

0

Entering edit mode

it shows on chromosome name in Log.out and SJ.out.tab file

What did you mean by that?

ADD REPLY • link 5.0 years ago by Friederike 8.9k

h.mon · Answer 1 · 2019-04-25

2

Entering edit mode

5.0 years ago

h.mon 35k

Yiu are using incorrectly the --sjdbGTFfeatureExon option, you should not pass a fasta file to it, rather, you should name the feature type you want to quantify (e.g. exon, CDS, and so on).

--sjdbGTFfeatureExon
    default: exon
    string: feature type in GTF ﬁle to be used as exons for building transcripts

ADD COMMENT • link 5.0 years ago by h.mon 35k

0

Entering edit mode

I have followed your suggestions, still it gave the same output (--twopassMode Basic and --sjdbGTFfeatureExon CDS)

ADD REPLY • link 5.0 years ago by skevalkumar • 0

1

Entering edit mode

Can you show the output of head Brassica_napus.annotation_v5.gtf?

ADD REPLY • link 5.0 years ago by h.mon 35k

0

Entering edit mode

chrA01  GazeA2  exon    831 1437    7.80    +   .   transcript_id "BnaA01g00010D";
chrA01  GazeA2  CDS 1070    1345    .   +   0   transcript_id "BnaA01g00010D";
chrA01  GazeA2  exon    1487    2124    2.15    -   .   transcript_id "BnaA01g00020D";
chrA01  GazeA2  exon    2256    2436    3.19    -   .   transcript_id "BnaA01g00020D";
chrA01  GazeA2  CDS 1645    2124    .   -   0   transcript_id "BnaA01g00020D";
chrA01  GazeA2  CDS 2256    2282    .   -   0   transcript_id "BnaA01g00020D";
chrA01  GazeA2  exon    2665    3177    4.45    -   .   transcript_id "BnaA01g00030D";

ADD REPLY • link updated 5.0 years ago by h.mon 35k • written 5.0 years ago by skevalkumar • 0

0

Entering edit mode

Any suggestion/comment on GTF file? or modification in input file?

ADD REPLY • link 5.0 years ago by skevalkumar • 0

1

Entering edit mode

It seems your gtf is missing "gene_id" tags, which is used by STAR. Where did you obtain this annotation from? Did you convert a gff annotation to gtf?

Probably you can set some STAR parameter to correctly handle this gtf, but you have to do some reading. Some posts that might be helpful:

read counts with gff file

Parent child relationship to use with GFF data. Are alignment to transcripts on exon or transcripts?

NCBI's GFF file: how to get gene counts with STAR

Read counts of STAR with gff file

ADD REPLY • link 5.0 years ago by h.mon 35k

0

Entering edit mode

I have downloaded it from http://brassicadb.org/brad/datasets/pub/BrassicaceaeGenome/Brassica_napus/ Yes, I have converted GFF3 to GTF (using gffread)

ADD REPLY • link 5.0 years ago by skevalkumar • 0

1

Entering edit mode

What happens if you use --sjdbGTFfile Brassica_napus.annotation_v5.gtf in your command line?

ADD REPLY • link 5.0 years ago by swbarnes2 14k

0

Entering edit mode

It shows FATAL Error, could not open file pGe.sjdbGTFfile=Brassica_napus.annotation_v5.gtf

ADD REPLY • link 5.0 years ago by skevalkumar • 0

0

Entering edit mode

Do you think you need that file? You certainly don't need the file of cDNA transcripts that you included for some reason.

ADD REPLY • link 5.0 years ago by swbarnes2 14k

0

Entering edit mode

I have changed it with --sjdbGTFfeatureExon exon, but there is no change in the output.

ADD REPLY • link 5.0 years ago by skevalkumar • 0