I also aligned the reads using RNA STAR in Galaxy. I used the built-in Mus musculus mm10 genome. I selected to NOT count number of reads per gene since I did this part later.
No reason not to count with STAR, it is essentially a free analysis: it doesn't add significant run time, memory or disk usage, compared to just mapping. And the results are equivalent to featureCounts and HTSeq.
Selected "stranded" for the input on strand specificity because that is what the core that gave me my samples did for library prep.
The most common library preparation (Illumina TruSeq stranded) would result in a reverse-stranded library, so the correct setting for featureCounts on the command line is -s 2
, and for HTSeq is -s reverse
- I guess there should be a stranded: reverse
option for Galaxy. You have to check the library preparation kit manual or ask the sequencing center staff for details.
I downloaded this .gtf file for the genome: Mus_musculus.GRCm38.96.gtf.gz (this is essentially mm10 from ensembl)
As arup hinted, you have to use genome and annotations from the same source, as the chromosome naming convention may be different between them (some sources use 1
, 2
, 3
and so on for chromosome names, while others use chr1
, chr2
, chr3
). Check the bam header and the annotation gtf to see if chromosome names are equal.
Did you check the mapping rates (from STAR) and feature assignment rates (from featureCounts)? Do they seem correct? What proportion of your reads is mapped and what proportion is assigned to a feature? Did you check if the mappings you see at the gene of interest are multi-mappers? Multi-mapping reads are not counted by STAR / featureCounts / HTSeq. Then, as kristoffer.vittingseerup pointed out, you would need to use a different method (RSEM, Salmon or kallisto) to be able to quantify these reads.
Some additional comments:
Please define your abbreviations the first time you use them - I had no idea GOI meant gene of interest
, I thought it was the name of a gene.
The issue of seeing mapped reads but having no counts is a recurrent one, and there are several posts on BioStars and other forums about it. If you search for the words on your post title, you will find several of them, do read them to check if a suitable solution has been posted elsewhere.
As you have been using Galaxy for most of your analyses, maybe you can ask your question there. Before doing so, follow up carefully on the suggestions here, and if none solves your question, then go ahead and post there. Please make sure to state here (edit your original question and add the information there) and at GalaxyHelp you cross posted, providing links between the posts.
Please provide more detail:
Read How To Ask Good Questions On Technical And Scientific Forums for guidance and reference for future questions.
Also is it just for 1 gene or is it for all genes you dont get any counts?
I appreciate the link for asking good questions - I'll elaborate the steps/commands:
Wow that's a novel.
Are you using the same transcript file for alignment and HTseq? Also, I would recommend you to use FeatureCounts for transcript quantification.
The core functionality of HTSeq and featureCounts is identical so I would have to argue there is no need to switch. If you want more accurate quantification you would need to use pseudo-alligners such as Kallisto or Salmon instead. Their main advantages are: 1) they can handle multimappers (featureCounts and HTSeq cannot) which usually are a large fraction of your data you otherwise ignore. 2) They have sequence bias correction algorithms.
I agree that Kallisto/Salmon can provide more accurate transcript quantification. But I was recommending featureCounts as it takes less time and resources to run.