Question

HISAT2 Indexing using annotation for Rattus_norvegicus

1

Entering edit mode

5.5 years ago

neranjan ▴ 60

Hi,

I am trying to create a HISAT2 index with annotation for Rattus_norvegicus (RAT) genome I downloaded from the Ensembl release 94.

I am currently using 220GB memory with 16 cores. My assumption is the memory which I am providing is adequate enough. But I can not create the HISAT2 index, and it gives the error of

Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
  Time to read SNPs and splice sites: 00:00:04
        is not reverse-deterministic, so reverse-determinize...
  Ran out of memory; automatically trying more memory-economical parameters.
        is not reverse-deterministic, so reverse-determinize...

and eventually fail with

Could not find approrpiate bmax/dcv settings for building this index.
Switching to a packed string representation.
Total time for call to driver() for forward index: 08:45:56

HISAT2 website does have rat index but they do not have the annotation.

Iam using the command

hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

to create the index.

Any ideas is greatly appreciated.

Thanks

HISAT2 annotation Ensembl index alignment • 4.5k views

ADD COMMENT • link updated 4.2 years ago by robin.mesnage • 0 • written 5.5 years ago by neranjan ▴ 60

0

Entering edit mode

Do you run on a cluster, and if so, what is the exact command, including the header lines for the scheduler? Did you request the entire memory of the node you are on?

ADD REPLY • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

No the node has 256GB memory and I only asked for 220GB of RAM. I never asks for the full amount since the node needs some memory to work with. In previous occasions I have only asked for 200GB.

In pervious cases for the same index I have asked for 300GB of RAM where the node had 512GB of memory , which didn't work as well.

ADD REPLY • link 5.5 years ago by neranjan ▴ 60

0

Entering edit mode

If you share the links to the necessary files, I can try to build it on a 3TB node if that helps you.

ADD REPLY • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

Yes that might help me , Thank you very much for the help.

I am using the files hosted by Ensembl Data Base, and using the hisat2 version 2.1.0 to build the index. Following is the SLURM script which I use to build it. I will post it, where the memory, partition and qos might change depending on the cluster and the the scheduler which is been used.

#!/bin/bash
#SBATCH --job-name=hisat
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 16
#SBATCH --mem=220G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

#genome 
wget ftp://ftp.ensembl.org/pub/current_fasta/rattus_norvegicus/dna/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz
#GTF 
wget ftp://ftp.ensembl.org/pub/current_gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.94.gtf.gz


for file in *.gz; do
       gunzip -d $file
done
echo "=========== Unzip Done ================="

BASE_NAME="Rattus_norvegicus"
FASTA_File="Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa"
GTF="Rattus_norvegicus.Rnor_6.0.94.gtf"
SPLICE="splice_site"
EXON="exon"

module load hisat2/2.1.0
#create splice sites
hisat2_extract_splice_sites.py ${GTF} > ${SPLICE}

#create exone file
hisat2_extract_exons.py ${GTF} > ${EXON}

#build index
hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

#build large index if the above does not work
#hisat2-build -p 16 --large-index --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

If the normal hisat2 build does not work, you can also try to build the large index using the commented part as well.

Thanks again for the help.

ADD REPLY • link 5.5 years ago by neranjan ▴ 60

1

Entering edit mode

I just started it and will come back once finished.

ADD REPLY • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

thanks, appreciate it. if it complete successfully would like to know, how much memory did it used ?

ADD REPLY • link 5.5 years ago by neranjan ▴ 60

1

Entering edit mode

It finished without issues on a 1.5TB node. Used about 500GB at max. I am compressing and uploading it now to a cloud, and will share the download link once finished:

There it is: https://uni-muenster.sciebo.de/s/ztztgCWvQujnhjq

ADD REPLY • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

Thank you very much, I really appreciate the help you gave me, going out of the way. I was able to download the index from the link.

1.6G Rattus_norvegicus.1.ht2
654M Rattus_norvegicus.2.ht2
1.3M Rattus_norvegicus.3.ht2
651M Rattus_norvegicus.4.ht2
1.4G Rattus_norvegicus.5.ht2
663M Rattus_norvegicus.6.ht2
7.8M Rattus_norvegicus.7.ht2
1.6M Rattus_norvegicus.8.ht2

Again thank you very much.

ADD REPLY • link 5.5 years ago by neranjan ▴ 60

1

Entering edit mode

You‘re very welcome :)

ADD REPLY • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

found a solution to generate the index using more memory

Cheers!

ADD REPLY • link 5.5 years ago by neranjan ▴ 60

0

Entering edit mode

Hi,

I am having the same issue with HISAT2 Indexing using annotation for Rattus norvegicus. I currently don't have access to a cluster with sufficient memory and I am stuck with my transcriptome analyses. I have seen that @ATpoint made these indexes available but the link is dead.

Would @ATpoint or any of you be able to share these indexes again?

Thank you in advance,

ADD REPLY • link 4.2 years ago by robin.mesnage • 0

0

Entering edit mode

I do not have them anymore. Why don't you use a tool such as salmon to quantify directly against the transcriptome. It barely requires any memory.

ADD REPLY • link 4.2 years ago by ATpoint 82k

score 1 · Accepted Answer · 2018-10-23

1

Entering edit mode

5.5 years ago

neranjan ▴ 60

I think the answer is to provide more memory for the run. Thank you ATpoint.

ADD COMMENT • link 5.5 years ago by neranjan ▴ 60

1

Entering edit mode

There is no need to close this question. Just accepting this as an answer is sufficient.

ADD REPLY • link 5.5 years ago by ATpoint 82k