Biostar Beta. Not for public use.
Mirna-Seq Trimming And Mapping?
0
Entering edit mode
4.4 years ago
Spain

Dear community,

I have paired-end miRNA datasets which I want to analyze and FASTQC results gave good stats to proceed, but because It's the first time I start analyzing this kind of samples, I noticed that it is extremely important in miRNA to remove 3' adapter contamination due to the fact of read small size. So because I mainly focused on other types of analyses which did not include adapter trimming I wanted to ask if the method I'm applying is right enough to make sure I'm removing the adapter.

5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG '3

I was planning to use miRdeep2 algorithm which currently has a trimming function; so I took the last 22nt from the adapter and applied the following command:

mapper reads.fastq -e -h -i -j -k CTCGTATGCCGTCTTCTGCTTG -l 18 -m -p genome -s reads.fa -t reads_vs_genome.arf -v -u -n

After this step I checked the mapping results and I was quite surprised because the mapping seems that fail:

mapping reads to genome index
#reads with at least one reported alignment: 537965 (4.35%)
#reads that failed to align: 11809642 (95.45%)
#reads with alignments suppressed due to -m: 25250 (0.20%)
Reported 660431 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends

Mapping statistics

#desc    total    mapped    unmapped    %mapped    %unmapped
total: 199690475    109878864    89811611    0.550    0.450
seq: 199690475    109878864    89811611    0.550    0.450


As you can see something went wrong with the mapping ( ¿¿95.5% fail to align?? ) and I don't know what it is...

I was thinking mainly in 2 possible issues:

1.- Adaptor trimming failed
2.- As miRdeep2 does not map paired-end data at once ( authors suggest to treat as single end ), maybe this is influencing in mapping results


I ask if someone experimented same issues and maybe could help clarify this

Thanks!

3
Entering edit mode

I have just a short question about your data: How do paired-end reads for microRNAseq look like? I mean... the mature microRNA sequence is about 24nt long. Do the pairs completely overlap each other? Is this protocol strand specific in the end?

0
Entering edit mode

No, the pairs do not overlap completely each other and is not strand specific protocol.

2
Entering edit mode

why did you take the last 22? the tail of the adapter is the part you'd least likely see intact. try the whole thing

0
Entering edit mode

Thanks for the suggestion! I tried and worked fine :)

0
Entering edit mode

I am hearing paired end seq for miRNA for the first time. If it is a publicly available data that you are analysing, then can you share the link ?

0
Entering edit mode

I'm sorry this is data not published yet so you will have to wait till the publication comes out

1
Entering edit mode
15 months ago
Fabio Marroni ♦ 2.2k
Italy

I dont' know miRdeep2 but maybe it reports somewhere statistics of adapter removal. Alternatively, you could try some other tool, such as cutadapt (https://code.google.com/p/cutadapt/) for adapter removal. So you can check if the problem lies in adapter trimming.

0
Entering edit mode

Thanks for the tool!

0
Entering edit mode
2.2 years ago
manekineko • 130
Bulgaria

Hi,

I have also miRNAs library in pair-end which was strange as I always analyzed single end?

I figured out that the R1 set contain "TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC" adapter + barcode + smallRNA adapter and at the beginning is the miRNA seq.

I have a bit confusion for the R2, what type of miRNA is sequnced there - is it the same as in R1 or different miR? is it on reverse-compliment state? And moreover After I clean R1 and R2, what should I do? Should I just combine all clean reads in one file or?

PS

I found that my R2 set is containing an just one adapter - GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT, and before it is the miRNA probably?

I have found only one post in the net seems similar but it does not explain how than you analyse the miRNAs as they are in R1 and R2, how they are related or they are just separate independant miRNAs - http://www.homolog.us/blogs/blog/2012/07/19/analysis-of-ngs-mirna-data-a-paired-ended-library/

1
Entering edit mode

First of all, do not ask your question as an answer at someone else's question (and a very old question at that).

Second, see this and this for similar questions, and Brian Bushnell comment here as to the better way to clean adapters for such cases.

0
Entering edit mode

Thanks! Sorry, I'm a new user of Biostars and I thought it is better to ask here rather open a new tread with similar topic.

To the topic - do you think merging PE will be better way for the miRNA analysis, or just use only PE1 (I have the miRNA in PE1 and in PE2 is a reverse-compliment) ? What actually merging is doing?

(PS Is merging step is after the adapter cliping or it is done on raw reads?)

EDIT, I tried PEAR and it output a strange result:

Assembled reads ...................: 33,662 / 4,000,000 (0.842%)
Not assembled reads ...............: 3,966,338 / 4,000,000 (99.158%)

0
Entering edit mode

@manekineko

In ancient DNA, we routinely faced the problem of having DNA fragments that are smaller than the read length and have adapters at the ends and overlapping portions of the forward and reverse. We tried to determine whether only trimming the adapters (using cutadapt) would perform just as well as trimming and merging (using our algorithm leehom which does both in a single step). We saw a reduction of noise when we trimmed and merge due to the cross-correction provided by observing the same base twice. Read our study here:

http://nar.oxfordjournals.org/content/42/18/e141

We haven't tried it on miRNA but I don't see why our results wouldn't apply to this problem as well.

0
Entering edit mode

I have the case which you describe in your paper Figure1 a) - the sequence if miRNA is 20-24 which is quite too small, and it is in the beginning of the R1 and in reverse-complement in R2 paired end file. But I think I have some kind of linker to increase the size of the sequence of interest.

I have run the program and it outputs this, can you help my to see what it is and is it normal:

Total 4000000; Merged (trimming) 3779698; Merged (overlap) 0; Kept PE/SR 147618; Trimmed SR 0; Adapter dimers/chimeras 72684; Failed Key 0

0 Jul 15 12:41 001.merged.fastq.fail.fq.gz
44616689 Jul 15 12:58 001.merged.fastq.fq.gz
4505814 Jul 15 12:58 001.merged.fastq_r1.fail.fq.gz
8765518 Jul 15 12:58 001.merged.fastq_r1.fq.gz
4726149 Jul 15 12:58 001.merged.fastq_r2.fail.fq.gz
8723321 Jul 15 12:58 001.merged.fastq_r2.fq.gz

0
Entering edit mode

Almost. Did you run with --ancientdna ? Because then it would merge the overlapping parts even in the absence of the adapter (Figure1 case B).

0
Entering edit mode

No I did not used --ancientdna. I have small RNAs so they are 19-24nt so probably overlaping?

I used it - leeHom -fq1 1.fastq -fq2 2.fastq -f TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -s GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT

Does it mean I have now 3779698 single reads megred from perfectly overlapping 3779698 pairs in 001.merged.fastq.fq.gz?

0
Entering edit mode

That means that the overwhelming majority 3.7M out of 4M had adapters and they were merged. Still, I would try to use --ancientdna to see the tail of the distribution.

0
Entering edit mode

With --ancientdna I got same, how to know which result to take from both?:

Total 4000000; Merged (trimming) 3779698; Merged (overlap) 0; Kept PE/SR 147618; Trimmed SR 0; Adapter dimers/chimeras 72684; Failed Key 0

1
Entering edit mode

ok I was expecting to see something different. Then use the first one from

001.merged.fastq.fq.gz