Biostar Beta. Not for public use.
Mapping RNA-Seq data on genome from another species
1
Entering edit mode
2.3 years ago
tlorin • 250
Switzerland

Hello all,

I have RNA-Seq data from a species A for which I don't have a reference genome. I would like to map this data on the closest available genome from species B.

I have read this post and I am planning to use hisat2 or STAR. Does any of you have recommandations regarding the parameters to set for mapping, given that I don't have a reference genome? I would expect that I should use this program with less "stringency" than if I could map on the species A genome. For instance, the default parameter in STAR is --outFilterMismatchNmax 10: should I set it to 20? 30? In hisat2, it is --score-min L,0,-0.2.

In addition, could you recommend any lecture related to this question?

2
Entering edit mode

Here you have a very good review about how to do NGS analysis with non-model organisms. It has a part on RNA-seq and how to deal with situations were you do not have a reference genomeNext-generation biology: Sequencing and data analysis approaches for non-model organisms

1
Entering edit mode
3.3 years ago
jnoble333 • 20

Hi tlorin,

0
Entering edit mode

Thanks for your answer! But why --outFilterMismatchNmax 8and not --outFilterMismatchNmax 15 or over threshold? Is this some arbitrary threshold that you found was best? I guess this depends on the evolutionary distance between both species too. Is this the post you refer to?

0
Entering edit mode

That is the post I went by. That threshold works for my dataset for the most part because I have a lot of reads. I'd say try different values and see how your results differ.

1
Entering edit mode
18 months ago
Walnut Creek, USA

BBMap will align RNA-seq data to genomes with a substantially lower identity than most other aligners, particularly other splice-aware aligners. You might add the flags "maxindel=200k minid=0.7" to increase alignment rate in this case.

0
Entering edit mode

Thanks Brian! If I add maxindel=200k minid=0.7 do I risk mapping to spurious locations or will the mapping report the best unique hit (so, the homologous sequence in my case)?