Biostar Beta. Not for public use.
hisat2-align exited with value 1
Entering edit mode
22 months ago
m.t.lorenc • 0

Hi, I have got the following error

samblaster: Version 0.1.24
samblaster: Inputting from stdin
samblaster: Outputting to stdout
Error reading _rstarts[] array: 16256, 20484
Error: Encountered internal HISAT2 exception (#1)
Command: /work/waterhouse_team/miniconda2/envs/hisat2/bin/hisat2-align-s --wrapper basic-0 -q -p 12 --rg-id=JoshP-S8-RI36_S2_L003__001.fastq.gz --rg JoshP-S8-RI36_S2_L003__001.fastq.gz --rg JoshP-S8-RI36_S2_L003__001.fastq.gz --rg JoshP-S8-RI36_S2_L003__001.fastq.gz --rg JoshP-S8-RI36_S2_L003__001.fastq.gz --dta -x /scratch/waterhouse_team/Taek_Test/NbV1ChF_JBrowse/hisat2_index/NbV1ChF -r /home/lorencm/josh/siRNAdata/MGRF-NGS-166_JOSHUA_small_RNA_TruSeq-96507411/FASTQ_Generation_2018-09-20_12_58_45Z-124325413/JoshP_S8_L003-ds.074f375d3d714369bc3f591103033e95/JoshP-S8-RI36_S2_L003_R1_001.fastq.gz 
(ERR): hisat2-align exited with value 1
samblaster: Input file is empty. Exiting.

with the following command:

hisat2 -q -p 12 \
--rg-id=${output} --rg ${output} --rg ${output} --rg ${output} --rg ${output} \
--${4} -x ${INDEX_PATH}/${INDEX_PREFIX} -r $r1 \
| samblaster -r | samtools view -@ 12 -bSh -f 0x2 -F 2316 - | samtools fixmate - - | samtools sort -@ 12 - -o ${3}/${output}.dedup.sorted.bam

What did I miss?

Thank you in advance

alignment RNA-Seq rna-seq • 646 views
Entering edit mode

Please show the entire function that you used (so the part where you declared the variables). What is --${4}? The option parameter is missing. Is it the splice site file? I also think it should be --rg-id ${output} without =. As jflopezfernandez said, use -U or -1/-2 depending on your data being single- or paired-end. If it is single-end, you'll need --ignoreUnmated to make samblaster accept non-paired data. Is this RNA-seq? If so, please use the search function on RNA-seq deduplication and then decide if it is indeed a good thing to do. Beyond that, you could optimize your pipe by printing the output of view as SAM rather than BAM (only option h instead of -bSh as fixmate can read SAM, saving the compression time from SAM to BAM. sort will then take care of the BAM conversion. Be sure to have your samtools at the current version (1.9).

Entering edit mode
22 months ago

So, there's a lot going on here. Let me preface my this by saying that I am not a biologist in any way shape or form; I'm a compiler writer who found this site by accident; keep in mind that I'm commenting only on what the documentation says the program is supposed to do and the code you typed, and I am making no assumptions on what you meant to do, since I have no idea what gene alignments are.

First, you specified both -q and -r. It's either or with those two, so there's that. I'm not sure if you actually meant to make your output variable get called by --rg. That option sets a label on the output, and it's not actually input. What's even weirder is that they're all the same file obviously, so I'm not sure what the point of that is.

The option I think you should have used is -q like you did at first because you have fastq files, but they're compressed. I saw the documentation say something about them, but honestly, there's no need to make your life more complicated than it already is, in my opinion.

ls ./*.gz | xargs -n 1 gzip -d

That will decompress every single .gz file in the directory so that you're only dealing with fasta files proper, and then you can simply pass them in. Which speaking of, it looks like you either pass in two like this:

-1 file1.fastq -2 file2.fastq

Or you go all out with the other option:

-U file1.fastq,file2.fastq,file3.fastq,etc.,file100.fastq

Now like I said earlier, the -r is redundant, since it contradicts -q. Moreover, The documentation clearly shows that either you specify a filename with -S where that is the output file, or you don't and the result gets printed to standard out. This second scenario is obviously the one you want, since otherwise there would be nothing to pipe to the next program. Exit code 1 in Unix means EXIT_FAILURE, and from the log in the beginning you can see that since the program is printing to stdout, so clearly it's not interpreting the last argument ($r1) as an output file. It doesn't really matter though, because whatever it's interpreting it as, hisat2 is interpreting it as an error grave enough to terminate immediately, and that's why you're getting that error from samblaster; since hisat2 just quit without printing anything to stdout, samblaster essentially got called with no input.

In case you're curious why there was output in the console but samblaster didn't ever get to see it, it's because errors print to stderr, not stdout, so unless you explicitly piped stderr to where stdout is going, as far as samblaster is concerned there was no input. It's a completely different stream.

Anyways, I hope that was somewhat helpful. Unfortunately I would need to see your input files and all your code to be able to really diagnose the problem, but I'll be around, let me know if there's anything I can help out with. Good luck, dude.

Entering edit mode

There is no need for fastq decompression. 99.99% of NGS tools can read compressed data be it gzip (most common) or bz2 (rather uncommon). hisat2 does for sure.

Entering edit mode

Wow, that's actually really cool, I didn't know that, thanks for the heads up

Entering edit mode

Thank you all for your response, but I still get the same problem:

> hisat2 -p 12 --rg-id JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --dta -x /scratch/waterhouse_team/Taek_Test/NbV1ChF_JBrowse/hisat2_index/NbV1ChF -U /home/lorencm/josh/siRNAdata/MGRF-NGS-166_JOSHUA_small_RNA_TruSeq-96507411/FASTQ_Generation_2018-09-20_12_58_45Z-124325413/JoshP_S9_L004-ds.a211f17a7df343efb6b7885cbe7f1a52/JoshP-S9-RI37_S5_L004_R1_001.fastq.gz | samtools view -@ 12 -h -F 2316 - | samtools sort -@ 12 - -o bam-hisat2/JoshP-S9-RI37_S5_L004__001.fastq.gz.dedup.sorted.bam
Error reading _rstarts[] array: 16256, 20484
Error: Encountered internal HISAT2 exception (#1)
Command: /work/waterhouse_team/miniconda2/envs/hisat2/bin/hisat2-align-s --wrapper basic-0 -p 12 --rg-id JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --rg JoshP-S9-RI37_S5_L004__001.fastq.gz --dta -x /scratch/waterhouse_team/Taek_Test/NbV1ChF_JBrowse/hisat2_index/NbV1ChF -U /tmp/31927.unp 
(ERR): hisat2-align exited with value 1

This is the original script

#usage: Gmp_NbV1_Final.gff3 NbV1ChF.fasta bam-hisat2 dta /home/lorencm/josh/siRNAdata

mkdir $3

# GFF3 to GTF
conda activate gffread


gffread $1 -T -o $GTF_FN
conda deactivate

conda activate hisat2 $GTF_FN > ${1%.*}.exon $GTF_FN > ${1%.*}.ss

# Index

INDEX_PATH=$(dirname $2)/hisat2_index
INDEX_PREFIX=$(basename ${2%.*})
#hisat2-build -p 1 $2 --ss ${1%.*}.ss --exon ${1%.*}.exon ${INDEX_PATH}/${INDEX_PREFIX}

#for r1 in $(find $5 -name "*R1*.gz");
for r1 in $(find $5 -name "*.gz");
  output=$(basename $(echo $r1 | sed 's/R1//g'))
  #r2=$(echo $r1 | sed 's/R1/R2/g')

  cat <<EOF
  #qsub <<EOF
#!/bin/bash -l

#PBS -N $output
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -l mem=70G
#PBS -l ncpus=12
##PBS -m bea


conda activate hisat2

# Alignment
hisat2 -p 12 \
--rg-id ${output} --rg ${output} --rg ${output} --rg ${output} --rg ${output} \
--${4} -x ${INDEX_PATH}/${INDEX_PREFIX} -U $r1 | samtools view -@ 12 -h -F 2316 - | samtools sort -@ 12 - -o ${3}/${output}.sorted.bam

samtools index ${3}/${output}.sorted.bam
samtools stats ${3}/${output}.sorted.bam > ${3}/${output}.sorted.bam.stats
conda deactivate



What did I miss?

Thank you in advance

Entering edit mode

Here is the source file for the program; this is hisat2.cpp. Notice that this is the error message you are getting. To find your error, we just have to go back up a little. enter image description here

You can see here that the function is dealing with this IO error (getting too many arguments from the user) by throwing an exception. Specifically, the function is throwing the integer 1, which is where both your exception and the number come from.

As a side note, there are so many things C++ lets you do to let the user know exactly what they did wrong and in what context (including literally adding a message to the actual exception) that this method of simply throwing an integer and calling it a day is almost criminal. Not only that, the entire point of throwing an exception is for the application to deal with the problem at run time, which is obviously not the case here. So props for finding this specific segment of code, this piece of code is hilariously odd.

Having said that, you're still passing in the parameters incorrectly. This is one of the error messages from lower down in the code that the program apparently didn't deem you worthy enough to merit: "Error: Must specify at least one read input with -U/-1/-2."

This is how the author, Professor Daehwan Kim, passes the parameters in the tutorial:

$HISAT2_HOME/hisat2 -f -x $HISAT2_HOME/example/index/22_20-21M_snp -U $HISAT2_HOME/example/reads/reads_1.fa -S eg1.sam

and here's another one where he has more input files:

$HISAT2_HOME/hisat2 -f -x $HISAT2_HOME/example/index/22_20-21M_snp -1 $HISAT2_HOME/example/reads/reads_1.fa -2 $HISAT2_HOME/example/reads/reads_2.fa -S eg2.sam

Definitely check out the documentation; it's your best friend, and tutorials by the developer (when we're not being too lazy to write them) can be downright life-saving. Best of luck.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3