Counting No. of Reads per gene from RNA-Seq with STAR
1
0
Entering edit mode
5.0 years ago

I am trying to get no. of reads per gene with STAR --quantMode.

I have followed the

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir /media/rm313/SDC/genomeDir \
  --genomeFastaFiles Brassica_napus_v4.1.chromosomes.fa --sjdbGTFfile Brassica_napus.annotation_v5.gtf \
  --genomeChrBinNbits 18 --limitGenomeGenerateRAM 910000000000

step to generate Genome.

To get no. of reads, i used

STAR --runThreadN 16  --runMode alignReads --outFilterMultimapNmax 1 \
  --outFilterMatchNmin 35 --outSAMtype SAM --quantMode GeneCounts \
  --twopassMode Basic --outFileNamePrefix "Wes1" --genomeDir /media/rm313/SDC/genomeDir \
  --sjdbGTFfeatureExon /media/rm313/SDC/Brassica_napus.annotation_v5.cds.fa \
  --readFilesIn /media/rm313/SDC/W1F.fastq /media/rm313/SDC/W1R.fastq

My questions

1) ReadsperGene.out.tab does not contain No. of reads per gene. it shows

N_unmapped    453074  453074  453074
N_multimapping    258436  258436  258436
N_noFeature   738252  812120  865586
N_ambiguous   0   0   0
  

2) I need reads per gene. however, it shows on chromosome name in Log.out and SJ.out.tab file.

Can you please suggest, what's wrong with these commands?

Thank you

RNA-Seq STAR • 4.8k views
ADD COMMENT
1
Entering edit mode

Your original mapping command had a space between the dashes and the option name (-- twopassMode Basic), but I believe your actual command doesn't have this space, otherwise STAR would error out.

ADD REPLY
1
Entering edit mode

after you run STAR and you do an ls -t command, which files do you see being generated and what's their content?

ADD REPLY
0
Entering edit mode

In the second STAR step, it generated Aligned.out.sam, Aligned.out.sam, Log.final.out, Log.progress.out, Log.out, ReadsPerGene.out.tab (already mentioned), SJ.out.tab

ADD REPLY
0
Entering edit mode

it shows on chromosome name in Log.out and SJ.out.tab file

What did you mean by that?

ADD REPLY
2
Entering edit mode
5.0 years ago
h.mon 35k

Yiu are using incorrectly the --sjdbGTFfeatureExon option, you should not pass a fasta file to it, rather, you should name the feature type you want to quantify (e.g. exon, CDS, and so on).

--sjdbGTFfeatureExon
    default: exon
    string: feature type in GTF file to be used as exons for building transcripts
ADD COMMENT
0
Entering edit mode

I have followed your suggestions, still it gave the same output (--twopassMode Basic and --sjdbGTFfeatureExon CDS)

ADD REPLY
1
Entering edit mode

Can you show the output of head Brassica_napus.annotation_v5.gtf?

ADD REPLY
0
Entering edit mode
chrA01  GazeA2  exon    831 1437    7.80    +   .   transcript_id "BnaA01g00010D";
chrA01  GazeA2  CDS 1070    1345    .   +   0   transcript_id "BnaA01g00010D";
chrA01  GazeA2  exon    1487    2124    2.15    -   .   transcript_id "BnaA01g00020D";
chrA01  GazeA2  exon    2256    2436    3.19    -   .   transcript_id "BnaA01g00020D";
chrA01  GazeA2  CDS 1645    2124    .   -   0   transcript_id "BnaA01g00020D";
chrA01  GazeA2  CDS 2256    2282    .   -   0   transcript_id "BnaA01g00020D";
chrA01  GazeA2  exon    2665    3177    4.45    -   .   transcript_id "BnaA01g00030D";
ADD REPLY
0
Entering edit mode

Any suggestion/comment on GTF file? or modification in input file?

ADD REPLY
1
Entering edit mode

It seems your gtf is missing "gene_id" tags, which is used by STAR. Where did you obtain this annotation from? Did you convert a gff annotation to gtf?

Probably you can set some STAR parameter to correctly handle this gtf, but you have to do some reading. Some posts that might be helpful:

read counts with gff file

Parent child relationship to use with GFF data. Are alignment to transcripts on exon or transcripts?

NCBI's GFF file: how to get gene counts with STAR

Read counts of STAR with gff file

ADD REPLY
0
Entering edit mode

I have downloaded it from http://brassicadb.org/brad/datasets/pub/BrassicaceaeGenome/Brassica_napus/ Yes, I have converted GFF3 to GTF (using gffread)

ADD REPLY
1
Entering edit mode

What happens if you use --sjdbGTFfile Brassica_napus.annotation_v5.gtf in your command line?

ADD REPLY
0
Entering edit mode

It shows FATAL Error, could not open file pGe.sjdbGTFfile=Brassica_napus.annotation_v5.gtf

ADD REPLY
0
Entering edit mode

Do you think you need that file? You certainly don't need the file of cDNA transcripts that you included for some reason.

ADD REPLY
0
Entering edit mode

I have changed it with --sjdbGTFfeatureExon exon, but there is no change in the output.

ADD REPLY

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6