Missing or disappearing output RepeatMasker
2
2
Entering edit mode
9.9 years ago
mtollis ▴ 30

I have received this message recently while using makeblastdb for rmblast in RepeatMasker, and it is a real head-scratcher for me.

After no errors and completely running through all cycles, RepeatMasker finishes but there are no output files. The only trace of the analysis is the rmblastdb.log file in the RepeatMasker/Libraries directory which reads:

Building a new DB, current time: 05/29/2014 10:54:02
New DB name:   /home/mtollis/RepeatMasker/Libraries/20140131/anolis/specieslib
New DB title:  /home/mtollis/RepeatMasker/Libraries/20140131/anolis/specieslib
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
Adding sequences from FASTA; added 776 sequences in 0.108892 seconds.

Perhaps the makeblastdb "error" is harmless and maybe it is merely coincidental that my analysis fails. I don't see how either true ambiguities or line endings are the problem, as my database is hardly novel: I am using the RepBase update and the -species command. the command appears to work, as it creates the species specific library as well as the general library in the RepeatMasker/Libraries directory.

Does anyone know why RepeatMasker would run without throwing any errors and then leave no output files whatsoever?

makeblastdb repeatmasker • 5.5k views
ADD COMMENT
0
Entering edit mode

<deleted>

ADD REPLY
0
Entering edit mode

In your fasta files, do your headers look ok? They all should start with ">" and header name and on the next line, true sequences should start. I can imagine these errors for the sequences without proper headers (just a guess though).

ADD REPLY
0
Entering edit mode

It is hard to diagnose the issue without seeing the exact commands. I realize this is an old post now, but if you can provide the command used, and some information about the data, it would likely be helpful for others. And, it's always nice to answer questions and see things resolved.

ADD REPLY
0
Entering edit mode

Here is the command I used:

RepeatMasker -no_is -pa 16 -species "vertebrates" -a -html -gff genome.fasta

And this is an error message I found in the standard output.

Can't call method "getScore" on unblessed reference at /home/mtollis/RepeatMasker/PRSearchResult.pm line 164.
ADD REPLY
0
Entering edit mode

Also, the data is a vertebrate-sized genome with hundreds of thousands of scaffolds. However, I have had RM work on these kinds of datasets with no problems before.

ADD REPLY
3
Entering edit mode
5.0 years ago
gbdias ▴ 150

This is an old post but I observed the same behavior in a more recent version of RepeatMasker (4.0.7). This version already has the fix to the bug you reported above, but the behavior persists. The program apparently runs to completion and throws no error message, but the running directory is empty after the run.

In my case, I figured it out as a file name problem. I ran RepeatMasker on several genome assemblies, and the ones where it did not produce any results were the ones where the file name had a plus sign in it. As in p+a_contigs.fasta. After I renamed these files to remove the + sign (pa_contigs.fasta) RepeatMasker finished successfully and produced all expected output.

ADD COMMENT
0
Entering edit mode

Thanks so much for this answer! I've just had the same problem - I could see the output files being generated, then they would disappear at the end of the RepeatMasker run, with no explanation and an apparently successful run. I changed the file names to remove the + and it has worked. :D

ADD REPLY
1
Entering edit mode
9.2 years ago
mtollis ▴ 30

From the RepeatMasker developer, who suggested the following two fixes:

"The culprit is the processing of the alignment data using the "-a" flag. I tracked it down to a bug
in a routine which handles joining DNA transposons. The ugly match set was:

334 C21533332 2812 2859 + HAT1_DR#DNA/hAT-Ac 598 645
299 C21533332 2812 2859 C hAT-N76_DR#DNA/hAT 2324 2371

And the line in ProcessRepeats is ( line 7852 )

# add fused element to our derived from list
if ( $options{'source'} ) {
$lastAnnot->addDerivedFromAnnot( $member );
}

This should be:

# add fused element to our derived from list
if ( $options{'source'} ) {
$lastAnnot->addDerivedFromAnnot( $member->{'annot'} );
}
"

"I found something which causes ProcessRepeats to go into an infinite loop. It keeps expanding an array until the computer runs out of memory and the process is killed. It didn't print the
"Can't call method "getScore" on unblessed reference at /home/mtollis/RepeatMasker/PRSearchResult.pm line 164"
You have seen before though. I am not sure how you got that a second time. In any case I fixed this problem and I wondered if you might rerun this file on your system. The fix is in the PRSearchResult.pm module. You can download a patched copy of the module here:

http://www.repeatmasker.org/~rhubley...chResult.pm.gz

Copy this into your RepeatMasker directory, backup your old file and unzip this one:

mv PRSearchResult.pm PRSearchResult.pm.bak
gunzip PRSearchResult.pm.gz

I hope this fixes your problem. Thanks for reporting this!"

ADD COMMENT
1
Entering edit mode

Thanks for the update. It's too bad there is not an easier method for distributing the updates, e.g., github.

ADD REPLY

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6