Question

if I am right to remove the contamination?

0

Entering edit mode

8.6 years ago

zizigolu ★ 4.3k

Hello all,

My adviser asked me like this,

I downloaded the fasta file containing rRNA-genes in my interest organism, made genome indexing with rRNA-genes as reference and mapped the timmed-fastq on indexed genome then I indexed genome by coding-genes sequence and mapped the unmapped reads resulted from the previous step on newly indexed genome.

Do you think I am wasting time and there another way to get rid of rRNA contamination?

Thank you

rRNA ribo-seq • 3.8k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

0

Entering edit mode

8.6 years ago

GouthamAtla 12k

If you want to know % of rRNA contamination, one approach would be to include everything in the reference and later count and remove the reads mapping to rRNA from SAM/BAM file.

If you just want to get rid of rRNA reads, just don't include the rRNA in the reference genome. They will remain as unmapped.

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.6 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Can you please elaborate more on how to calculate contamination? I have a file that shows the intersect between TSS CAGE data and rRNA overlaps. I am not sure how to measure rRNA contamination from this. I would like to do so in R.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.6 years ago by espop23 ▴ 60

0

Entering edit mode

Show the first few lines of the file.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.6 years ago by GouthamAtla 12k

0

Entering edit mode

      V1        V2        V3                         V4 V5 V6
 1  chr1 108113121 108113122 chr1:108113121-108113122,-  3  -
 2  chr1 108113470 108113471 chr1:108113470-108113471,-  1  -
 3  chr1 237766677 237766678 chr1:237766677-237766678,+  1  +
 4  chr1  91853110  91853111   chr1:91853110-91853111,-  1  -

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.6 years ago by espop23 ▴ 60

0

Entering edit mode

$BT2/bowtie2 -N 0 -L 15 -x rRNA --un SRR1211041_trimmed_unmapped.fastq -U SRR1211041_trimmed.fastq -S mapped_and_unmapped.sam

using above command first I mapped the reads on rRNA then I will have SRR1211041_trimmed.fastq which I aligned with indexed genome by coding-gene sequence using this syntax

$BT2/bowtie2 -x rRNA -U SRR1211041_trimmed_unmapped.fastq -S my.sam

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Hello Goutham, I am new to bioinformatics field. I want to know % of reads matching rRNA genes. Can you please give me the steps involved in it and how to do it?

Also can I get % of reads from any specific gene as well?

ADD REPLY • link 8.6 years ago by frank1987lee1987 ▴ 10

0

Entering edit mode

for % of reads matching rRNA genes, u first indecize the whole rRNA.fasta (you can get this fasta file from ensembl) then mapped your reads against them and from result you can find the percent of reads mapped on the rRNA genes

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Dear Fereshteh, below is the bowtie2 output I received after running the following command.

What is aligned concordantly 0,1, >1 times?

0.12% overall alignment rate - Does this 0.12% refer to percentage of reads mapping to rRNA genes?

Frank$ bowtie2 -N 0 -L 15 -x rRNA_genes -1 Project/Sample/DH558-1_GTGGCC_L005_R1.all.fastq.gz -2 Project/Sample/DH558-1_GTGGCC_L005_R2.all.fastq.gz -S Project/DH558-1.sam

66113117 reads; of these:
  66113117 (100.00%) were paired; of these:
    66061268 (99.92%) aligned concordantly 0 times
    58 (0.00%) aligned concordantly exactly 1 time
    51791 (0.08%) aligned concordantly >1 times
    ----
    66061268 pairs aligned concordantly 0 times; of these:
      13 (0.00%) aligned discordantly 1 time
    ----
    66061255 pairs aligned 0 times concordantly or discordantly; of these:
      132122510 mates make up the pairs; of these:
        132068318 (99.96%) aligned 0 times
        13629 (0.01%) aligned exactly 1 time
        40563 (0.03%) aligned >1 times
0.12% overall alignment rate

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by frank1987lee1987 ▴ 10

0

Entering edit mode

you know Frank, actually me also new in NGS but i think you right, totally 0.12% of the reads have been mapped on the rRNA genes... I think 0, 1,.. times (concordantly maybe means all these reads harmoniously) means that if a read has been mapped only once or twice, etc...something like multimapping that is more common in eukaryotes because of repetition in genome, introns...anyway short reads tend to be mapped on some other places in the genome especially in eukaryotes.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Thanks Fereshteh for your suggestions. I have created a new post based on this to confirm our understanding.

Hopefully somebody confirms it :)

ADD REPLY • link 8.6 years ago by frank1987lee1987 ▴ 10

0

Entering edit mode

8.6 years ago

frank1987lee1987 ▴ 10

Thanks Fereshteh for your help. Currently, I am running the bowtie2 alignment between my fastq reads and rRNA gene index. Once it is completed, will go through the output and get back to you.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by frank1987lee1987 ▴ 10

0

Entering edit mode

great job Frank

ADD REPLY • link 8.6 years ago by zizigolu ★ 4.3k

score 5 · Accepted Answer · 2015-08-29

5

Entering edit mode

8.6 years ago

seta ★ 1.9k

Try SortMerna tool, it is definitely easier than yours.

ADD COMMENT • link 8.6 years ago by seta ★ 1.9k