Error in CNV Calling with Annotation
2
0
Entering edit mode
5.2 years ago
wei.wei ▴ 10

Hi, I was trying to call CNV in yeast genomes by running

cnvkit.py batch Sample1.bam Sample2.bam -n Control1.bam Control2.bam -m wgs -f sacCer.fasta --annotate refFlat.txt

command, but I ran into the error below:

Traceback (most recent call last):
  File "/Users/wwei/anaconda2/bin/cnvkit.py", line 13, in <module>
    args.func(args)
  File "/Users/wwei/anaconda2/lib/python2.7/site-packages/cnvlib/commands.py", line 113, in _cmd_batch
    args.count_reads, args.method)
  File "/Users/wwei/anaconda2/lib/python2.7/site-packages/cnvlib/batch.py", line 74, in batch_make_reference
    bam_fname, *autobin_args, bp_per_bin=50000.)
  File "/Users/wwei/anaconda2/lib/python2.7/site-packages/cnvlib/autobin.py", line 96, in do_autobin
    tgt_bin_size = depth2binsize(tgt_depth, target_min_size, target_max_size)
  File "/Users/wwei/anaconda2/lib/python2.7/site-packages/cnvlib/autobin.py", line 62, in depth2binsize
    bin_size = int(round(bp_per_bin / depth))
    ValueError: cannot convert float NaN to integer

When I omit the --annotate option, it worked fine and I was able to obtain the .cnr and .cns files. Just wondering if anyone has ever encountered similar issues and if there's anything I could do about it. Thank you.

cnvkit • 1.9k views
ADD COMMENT
1
Entering edit mode

Welcome to Biostars and thank you for the contribution! Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode
5.2 years ago
Eric T. ★ 2.8k

Thanks for reporting, I've filed this issue in the project's GitHub repo: https://github.com/etal/cnvkit/issues/421

It looks like the autobin step here ran into a NaN when doing some basic arithmetic with bin depths to estimate a reasonable average bin size. It's surprising that --annotate is responsible for the crash, as autobin shouldn't be doing anything with gene names.

Once you've determined a reasonable average bin size, do you still see the crash with cnvkit.py batch ... --target-avg-size=<that-bin-size> --annotate?

ADD COMMENT
0
Entering edit mode

Hi, thanks for the help. When I used --target-avg-size option it told me the real issue was that the chromosome names did not match in my input files...so I was able to fix that. I don't know why it threw me a bin size error in the beginning.

ADD REPLY
0
Entering edit mode
2.9 years ago
linehammer ▴ 10

You can avoid this with a mask method. Note first that in python NaN is defined as the number which is not equal to itself:

float('nan') == float('nan')      
False

The "ValueError: cannot convert float NaN to integer" raised because of Pandas doesn't have the ability to store NaN values for integers. From Pandas v0.24, introduces Nullable Integer Data Types which allows integers to coexist with NaNs. This does allow integer NaNs . This is the pandas integer, instead of the numpy integer. So, use Nullable Integer Data Types (e.g. Int64).

df['x'].astype('Int64')

NB: You have to go through numpy float first and then to nullable Int32, for some reason.

ADD COMMENT

Login before adding your answer.

Traffic: 1998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6