Question

miRDeep2.pl error when reads.fa input file contains info for more than one sample

0

Entering edit mode

7.0 years ago

fana ▴ 40

Hi,

I am having trouble using miRDeep2 package. It looks like I am running mapper.pl using a config.txt file which contains multiple samples correctly. However, when I try to run miRDeep2.pl I get the following error. If I run quantifer.pl it runs smoothly though. Any ideas?

Error: problem with processed_reads.fa
Use of uninitialized value in split at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/sanity_check_reads_ready_file.pl line 179, <IN> line 11334728.
Use of uninitialized value in length at /usr/biosoft/packages/miRDeep2/mirdeep2_0_0_8/bin/sanity_check_reads_ready_file.pl line 185, <IN> line 11334728.
Error in line 5.667.364: The sequence
AACCCGTAGATCCGAACTTGT
occures at least twice in your reads file.

At first it occured at line 
Please make sure that your reads file only contains unique sequences.

next-gen mirna-seq mirdeep2 • 3.4k views

ADD COMMENT • link updated 5.4 years ago by h.mon 35k • written 7.0 years ago by fana ▴ 40

score 2 · Answer 1 · 2018-12-03

I didn't see the thread before, so posting an late answer: I had exactly the same error when using an incorrectly formatted "config.txt" file with mapper.pl. I suspect miRDeep expects the three-letter codes to be unique, not related to treatment. When I corrected the three-letter codes to unique ones (TR1, TR2, CT1, CT2 as opposed to TRT, TRT, CTL, CTL), later quantifier.pl worked fine.

score 1 · Answer 2 · 2017-04-03

1

Entering edit mode

7.0 years ago

galina_ananina ▴ 20

If I remember it right, we concatenated all samples to one and ran mapper.pl. Then, we applied mirdeep.pl. using reads.fa and others required files and it did work.

ADD COMMENT • link 7.0 years ago by galina_ananina ▴ 20

h.mon · Answer 3 · 2017-04-03

1

Entering edit mode

7.0 years ago

Chris Fields ★ 2.2k

I've run this with the config.txt file before w/o problems, but I collapsed reads (-m option with mapper.pl). The sanity check step that failed seems to indicate you have a duplicate read present, which means the reads haven't been collapsed.

One recent example run that worked fine (your mileage may vary):

mapper.pl $CONFIG -o $PBS_NUM_PPN \
    -d -e -q -j -l 17 \
    -m -h -u -n \
    -p $INDEX \
    -s reads_collapsed.fa \
    -t reads_collapsed_vs_genome.arf \
    -v 2> mapping.out

miRDeep2.pl reads_collapsed.fa \
    $GENOME \
    reads_collapsed_vs_genome.arf \
    mature-species.fa \
    mature-other.fa \
    precursor.fa \
    -P 2> report.log

EDIT: of course, edit the relevant variables with your config, genome, genome index, etc.

ADD COMMENT • link 7.0 years ago by Chris Fields ★ 2.2k

0

Entering edit mode

Thank you for the example. Could you please have a quick look at my commands? I did collapse reads (see the top lines of the 'reads.fa' file below too). I expect that when you have multiple samples some sequences might be the same across them.

mapper.pl config.txt -d -e -p galGal4 -s processed_reads.fa -t mapped_reads.arf -h -m -i -j

Inspecting reads:

head processed_reads.fa

>ac1_0_x161356
TTTGGCAATGGTAGAACTCACACT
>ac1_161356_x83226
TTTGGCAATGGTAGAACTCACA

miRDeep2.pl processed_reads.fa galGal4.fa mapped_reads.arf gga4_mirbase21_mature.fa none gga4_mirbase21_hairpin.fa

ADD REPLY • link updated 5.4 years ago by h.mon 35k • written 7.0 years ago by fana ▴ 40

1

Entering edit mode

That's essentially correct, yes; reads are collapsed per sample. What this seems to indicate is that you have two reads with the same sequence from the same sample. Should be easy enough to see if you grep for the sequence and check the line before:

-system-specific-4.1$ grep -B1 '^AACCCGTAGATCCGAACTTGT$' reads_collapsed.fa
>1Ax_4765383_x6985
AACCCGTAGATCCGAACTTGT
--
>1Bc_6495012_x7553
AACCCGTAGATCCGAACTTGT
--
>1Bt_5789889_x5807
AACCCGTAGATCCGAACTTGT
--
>1Dx_5140017_x8899
AACCCGTAGATCCGAACTTGT
--
>1Ex_6157495_x5649
AACCCGTAGATCCGAACTTGT
...

ADD REPLY • link 7.0 years ago by Chris Fields ★ 2.2k