Hello friends I am new to the RNA-seq anaIyse have fastq files single end (1x50) of mRNA and sRNA (each file contains the reads of two organisms) ad and I want to study the expression of genes in each sample.
I made a mapping by STAR for the first sample train with the following command:
$ Linux_x86_64/STAR --runThreadN 12 --genomeDir /index --sjdbOverhang 49 --readFilesIn /sample1.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix /output -outSAMtype BAM SortedByCoordinate
but I got :
Uniquely mapped reads% | 58.83%
% of reads mapped to multiple loci | 38.77%
% of reads mapped to too many loci | 0.46%
I do not understand why I got this high percentage of reads mapped to multiple loci. Do you have an idea to improve the result of the mapping of mRNA in the STAR command that I used ???????
What is the best way to map sRNA reads to the reference genome ??
Thank you in advance
Start troubleshooting your reads. Check for rRNA contamination, and check which genes have a particularly high number of multi-mappers.
can you give me more precision for the methodology I'm going to do
Examine the alignments with IGV, or filter the multi-mappers with samtools (many methods for doing so, e.g. here, here, here and here) and examine the multi-mappers alignment with IGV.
Use bbduk with the ribokmers.fa.gz file to check for rRNA contamination.
Another question Plz. For sRNA mapping, I find somebody who selects reads from 18-30 and others from 18-26 before mapping. I want to know what size to select??
You have mentioned that each file contains reads from two organisms. What are these organisms? If their genomes are similar, then you will get high number of mulitmaps.
these two organisms are not similar, it is a human genome infected by a pathogenic bacterium and therefore the mRNA has been sequenced it contains reads of bacteria and genome huamain (of course reads the human genome is more than bacterial genome ). I do not know is it normal to have the multimapped reads ?? or for a study of expression must I ask for sequencing paired end with a lognueur more than 50bp?