Hisat2 index building taking days even with 40 GB RAM
0
0
Entering edit mode
6.0 years ago
piyushjo ▴ 700

Hi,

I am a new person here. I am trying to make an index file from gencode lncrna annotation. I did the following.

extract splice set information

hisat2_extract_splice_sites.py gencode.v28.long_noncoding_RNAs.gtf >gnc.ss

extract exon information

hisat2_extract_exons.py gencode.v28.long_noncoding_RNAs.gtf >gnc.exon

run build

hisat2-build -p11 --ss gnc.ss --exon gnc.exon geno GRCh38.p12.genome.fa genlnc

The computer I am using is a windows workstation with 12 cores (I am using 11 cores, but it hardly uses 10% CPU at most). It shows installed RAM as 45 GB, of which it is using almost 40 GB for hisat2-build. I started the process on Monday and even though the computer is running continuously, it hasn't built the index. The hisat2 paper suggested that building an index for whole genome with 160 GB should take 2-3 hours. So I am confused why it hasn't finished even in 5 days if I have 1/4 of recommended RAM.

Before I tried using the primary assembly file to make the index and it didn't finish in two weeks. So I thought may the primary assembly file is too big and switched to p12. When I try to run ls -lh, I see that the biggest file is .rtf file which I read is a temporary file. Right now it is 42 GB. I am using cygwin to run linux commands on the windows. Am I missing something? Please advise.

Also on the side, could you tell me difference between using primary assembly and p12 or newer assembly for making index file?

sequencing hisat2 build • 4.0k views
ADD COMMENT
0
Entering edit mode

I don't have the computational explanation you're looking for, unfortunately, but I don't think the time to completion scales down linearly in the way you're expecting. I tried building an index with 32GB of RAM and it failed - I think the index build needed to load more data than that into the memory. I eventually used a cluster and assigned ~200GB to the operation, and it ran smoothly. If you have access to cloud or cluster resources I recommend you go that route.

ADD REPLY
0
Entering edit mode

Hi Russ. Thanks for the reply. I do have access to a Linux cloud/cluster. But I couldn't install hisat2 over there. Any tips for that?

ADD REPLY
1
Entering edit mode

You'll have to talk to the sys admin of your cluster if you don't have the privileges to install hisat2.

ADD REPLY
0
Entering edit mode

Have you tried installation using (bio)conda?

ADD REPLY
0
Entering edit mode

Hi Wouter. I was able to download the hisat2 and add it to the path on linux server. Now I am running into the problem of libstdc++.so.6 bot being updated. I asked the server manager and he said the system is old and updating is a pain. Could you tell me if there are some linux servers I can access and perform this and if they are free.

ADD REPLY
0
Entering edit mode

So you have tried installation using bioconda?

ADD REPLY
0
Entering edit mode

No I just downloaded and unpacked the binary for Linux from hisat2 and added the directory to path.

ADD REPLY
1
Entering edit mode

Why don't you try installation using bioconda?

ADD REPLY
0
Entering edit mode

Couldn't install miniconda because of the same libstdc problem. :(

ADD REPLY
0
Entering edit mode

Right, well, that sucks.

ADD REPLY

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6