Question

Help with nhmmscan on PacBio WGS Data with Dfam

0

Entering edit mode

6.2 years ago

roxane.dunbar • 0

Hello,

I have a few questions regarding nhmmscan. I am very new to using hmms and hmmscan, etc.

I am trying to replicate Pendleton et al., 2015 identification of of MEI insertions in the PacBio sequenced NA12878 genome. They state in the second to last paragraph of the supplementary data that they used nhmmer with Dfam using the script 'dfamscan.pl' with default parameters.

I am only interested in identifying the L1Hs in this genome, and I know from the paper there are 118 of them.

I tried running the script on default parameters, but after a week of running (and no standard output to say it was actually doing anything), I killed it, and decided to run with just the L1HS 5' hmm instead. it's been going for over 24 hours.

I guess my first question is, am I running it right? My command is as below:

perl /media/RAID/rdunbar/hmmer/dfamscan.pl -fastafile /media/RAID/rdunbar/hmmer/corrected_reads_gt4kb.fasta -hmmfile /media/RAID/rdunbar/hmmer/L1HS_L1/DF0000225.hmm -dfam_outfile /media/RAID/rdunbar/hmmer/Results/PacBio_Dfam_hits_DF0000225.out

The hmm file was obtained from here: http://dfam.org/entry/DF0000225

The fasta file is the cleaned reads from PacBio NA12878 run, and is 60G: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NA12878_PacBio_MtSinai/corrected_reads_gt4kb.fasta

My second question is with a 60G fasta file, and only searching for 1 element hmm, how long roughly should this take?

Running top shows:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

6231 rdunbar 20 0 960284 93000 3188 S 94.4 0.2 1348:37 nhmmscan

Also, top seems to show nhmmscan as almost constantly in S mode. In the terminal, this is all that has been displayed since yesterday:

rdunbar@plymouthcruncher:/media/RAID/rdunbar/hmmer/hmmer-3.1b2/src$ ./nhmmscan /media/RAID/rdunbar/hmmer/L1HS_L1/DF0000225.hmm /media/RAID/rdunbar/hmmer/corrected_reads_gt4kb.fasta > /media/RAID/rdunbar/hmmer/Results/PacBio_nhmmer_hits_DF0000225.out

Any help would be most appreciated.

Kindest regards, Roxane

PacBio Hmm nhmmscan WGS dfam • 1.7k views

ADD COMMENT • link 6.2 years ago by roxane.dunbar • 0

score 1 · Accepted Answer · 2018-02-22

1

Entering edit mode

6.2 years ago

roxane.dunbar • 0

It has now run successfully. I was just impatient!

ADD COMMENT • link 6.2 years ago by roxane.dunbar • 0

0

Entering edit mode

Congratulations!

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k