Question: STAR or Bowtie for small RNA seq?
0
Entering edit mode

Hi all!

I am analyzing some small RNA seq data. To map reads to genome, i'm wondering if STAR or bowtie would be a better fit for my data.

My reads are between 15-30bp in length.

Many thanks for your suggestions.

ADD COMMENTlink 12 months ago max_19 • 100 • updated 12 months ago Bastien Hervé 4.2k
Entering edit mode
0

How did you get reads with 15-30bp long ? Those will be hard to align properly

ADD REPLYlink 12 months ago
Bastien Hervé
4.2k
Entering edit mode
2

This would be quite expected in smallRNA-seq data after trimming sequencing adaptors and filtering low quality reads.

ADD REPLYlink 12 months ago
v82masae
• 130
Entering edit mode
0

As I said in my answer below I missed this part reading the post :) Monday morning pleasures

ADD REPLYlink 12 months ago
Bastien Hervé
4.2k
3
Entering edit mode

I don't think Bowtie(2) is splice aware, you'd want STAR since this is RNA-Seq, or Tophat2. HISAT2 is another one.

Personally I really like STAR and it does well in peer reviewed benchmarks. And there's a parameter to share the memory between concurrent processes to align multiple samples at once.

ADD COMMENTlink 12 months ago manuel.belmadani • 830
Entering edit mode
2

STAR and HISAT2 are splice aware but becareful with Tophat

ADD REPLYlink 12 months ago
Bastien Hervé
4.2k
Entering edit mode
1

Thank you. Being splice aware is definitely preferred. I can see from the manual that STAR also outputs a SJ.out.tab file which contains splice junctions in tab-delimited format. Does this mean that it is essentially able to identify junction-mapping small RNAs?

ADD REPLYlink 12 months ago
max_19
• 100
Entering edit mode
0

It does not need to be splice-aware. smallRNAs do typically not undergo splicing and one aligns against an existing database like miRbase for microRNA instead of the genome, requiring ungapped alignments tuned for very short reads, which is what bowtie is very good at (and bowtie2 not, because it performs better at longer read lengths).

ADD REPLYlink 12 months ago
ATpoint
17k
2
Entering edit mode

We get pretty decent alignment rates and accurate results with Bowtie and following specifications:

bowtie -n 1 -l 10 -m 100 -k 1 --best --strata
ADD COMMENTlink 12 months ago v82masae • 130 • updated 12 months ago ATpoint 17k
2
Entering edit mode

I missed the fact you were using small RNA-seq data. Your sequences are too short to be analyze with classic RNA-seq tools, see also

Best/right way to quantify small RNA transcripts

But if you want to stick with STAR, here are some advises from Alexander Dobin, one of the STAR authors, to align miRNA

ADD COMMENTlink 12 months ago Bastien Hervé 4.2k
Entering edit mode
1

Thanks so much for that link, very helpful! I ended up giving STAR a go, with the recommended parameter settings in that link. Below is my final log output, i think the reads are mapping pretty well!

                  Number of input reads |    39129818
              Average input read length |    22
                            UNIQUE READS:
           Uniquely mapped reads number |    31732915
                Uniquely mapped reads % |    81.10%
                  Average mapped length |    21.73
               Number of splices: Total |    1388166
    Number of splices: Annotated (sjdb) |    1388166
               Number of splices: GT/AG |    1380537
               Number of splices: GC/AG |    5808
               Number of splices: AT/AC |    46
       Number of splices: Non-canonical |    1775
              Mismatch rate per base, % |    0.20%
                 Deletion rate per base |    0.00%
                Deletion average length |    1.00
                Insertion rate per base |    0.00%
               Insertion average length |    1.02
                     MULTI-MAPPING READS:
Number of reads mapped to multiple loci |    6018428
     % of reads mapped to multiple loci |    15.38%
Number of reads mapped to too many loci |    69
     % of reads mapped to too many loci |    0.00%
                          UNMAPPED READS:    % of reads unmapped: too many mismatches |    0.00%
         % of reads unmapped: too short |    3.08%
             % of reads unmapped: other |    0.44%
  
ADD REPLYlink 12 months ago
max_19
• 100

Login before adding your answer.

Powered by the version 1.8