Biostar Beta. Not for public use.
Question: Goseq: Pwf: Error In If (Hi <= Low) { : Missing Value Where True/False Needed
0
Entering edit mode

Hi, all

Dose anyone familiar with the use of goseq package here, I always get quick reply here, so I post my questions here, hope for some suggestions.

I am trying to use goseq for RNAseq data analysis of a non-native bacteria, I fetch the genelength data and GO term data by using ensembl Perl API. Differential expression was done by DESeq2.

After data run pwf of goseq, error occurred as following:

#

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

It seems to be the problem of the names or ID problem, but I could not solved it till now.

Have any one met this before? How could I solved this problem?

Thanks!

My workflow were as following

##   import the genes ID which I fetch from Ensembl

HZ254gene_data <- read.delim("E:/HZ_GO_enrichment/HZ254gene_data.txt", quote="")

colnames(HZ254gene_data)[1]="locus_tag"

##  get assayed gene vector

assayed.genes=HZ254gene_data$locus_tag

assayed.genes1=as.vector(assayed.genes)

is.vector(assayed.genes1)

[1] TRUE

head (assayed.genes1)

[1] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" "Mtc_0006"

str(assayed.genes1)

chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" ...


## get gene length data for pwf bias.data

length=HZ254gene_data$length

is.vector(length)

[1] TRUE

names(length)=assayed.genes1

head(length)

Mtc_0001 Mtc_0002 Mtc_0003 Mtc_0004 Mtc_0005 Mtc_0006 
     768      654     1899     1770      351      654

#
str(length)

 Named int [1:2512] 768 654 1899 1770 351 654 903 72 966 1071 ...
 - attr(*, "names")= chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" ...



##import results from DESeq2 and get the DEgenes 

HZ254DifList_Mer <- read.delim("E:/HZ_GO_enrichment/HZ254DifList_Mer.txt")

de.genes1=as.integer(HZ254DifList_Mer$padj<0.05)

##get the genes for pwf 

gene.vector=as.integer(de.genes1)

names(gene.vector)=assayed.genes1

head(gene.vector)

##import the GO_gene association table from ensembl API 

HZ254gene_go <- read.delim("E:/HZ_GO_enrichment/HZ254gene_go.txt", quote="")

go.terms=HZ254gene_go


## run pwf, 

pwf=nullp(gene.vector,bias.data=length)

##got errors

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

sessionInfo()

R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] goseq_1.14.0            geneLenDataBase_0.99.12 BiasedUrn_1.06.1       
[4] BiocInstaller_1.12.0   

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.24.0   Biobase_2.22.0         BiocGenerics_0.8.0    
 [4] biomaRt_2.18.0         Biostrings_2.30.1      bitops_1.0-6          
 [7] BSgenome_1.30.0        DBI_0.2-7              GenomicFeatures_1.14.2
[10] GenomicRanges_1.14.4   grid_3.0.2             IRanges_1.20.6        
[13] lattice_0.20-24        Matrix_1.1-2           mgcv_1.7-28           
[16] nlme_3.1-113           parallel_3.0.2         RCurl_1.95-4.1        
[19] Rsamtools_1.14.3       RSQLite_0.11.4         rtracklayer_1.22.3    
[22] stats4_3.0.2           tools_3.0.2            XML_3.98-1.1          
[25] XVector_0.2.0          zlibbioc_1.8.0
ADD COMMENTlink 6.1 years ago liupfskygre • 190 • updated 2.5 years ago IrK • 30
Entering edit mode
1

Hi, liupfskygre, It's hard to know exactly what the issue is without a reproducible example, but perhaps the assignment of values to variable names, which are also base R functions, could be a problem. Try replacing length=HZ254gene_data$length with lgth=HZ254gene_data$length. If the nullp function calls length (the function), your use of this name as a variable might be a conflict.

ADD REPLYlink 6.1 years ago
KKeenan02
• 70
Entering edit mode
0

thanks for your reminding! I recheck the data and found where the problems is, there were some genes in the dataset that do not have expression data, which were assigned as NA!

ADD REPLYlink 6.1 years ago
liupfskygre
• 190
0
Entering edit mode

To not leave this question appear unanswered:

The first thing you have do after seeing such an error is to call traceback() and add the output to the error report.

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

means, as it says, there is a missing value where if requires a TRUE/FALSE value, <= and other comparison operators in R yield 'NA' if and only if at least one of the values is missing itself. In conclusion, your input data contains missing values which goseq does not handle.

In case that happens, missing values need to be removed (e.g. by using complete.cases) or imputed, see http://www.statmethods.net/input/missingdata.html for a quick tutorial.

ADD COMMENTlink 6.1 years ago Michael Dondrup 46k
Entering edit mode
0

Thanks for your explanations!

ADD REPLYlink 6.1 years ago
liupfskygre
• 190
0
Entering edit mode
de.genes1=as.integer(HZ254DifList_Mer$padj<0.05)`

This looks like a logical vector casted into an integer - can you print de.genes1 out

ADD COMMENTlink 6.1 years ago Jeremy Leipzig 18k
Entering edit mode
0

thanks, as for my case, missing data "NA" in the de.genes1 caused the problem! At first I tried to convert de.gene1 into numeric, but failed, recheck the data found NA existed.

ADD REPLYlink 6.1 years ago
liupfskygre
• 190
0
Entering edit mode

In case anyone else runs into this problem, one way you can end up with the same error is if the gene lengths are all the same.

ADD COMMENTlink 2.5 years ago keith.hughitt • 250
0
Entering edit mode

I had the same error message and in my case it a binary vector had NAs, so basically:

> genes = as.integer(DESeqoutput$padj <= 0.05 & > !is.na(DESeqoutput$padj))   # Give all values less than 0.05 and no NAs

However, when I call for nullp() I get following warning:

Can't find mm10/ensGene length data in genLenDataBase...
Found the annotaion package, TxDb.Mmusculus.UCSC.mm10.ensGene
Trying to get the gene lengths from it.
Warning messages:
1: In library() : library ‘/usr/lib/R/site-library’ contains no packages
2: In pcls(G) : initial point very close to some inequality constraints

It is warning message but I would like to interpret it. What does it mean? Thanks

ADD COMMENTlink 2.5 years ago IrK • 30
Entering edit mode
1

that R is gibberish. What do you think the gene vector should look like?

ADD REPLYlink 2.5 years ago
Jeremy Leipzig
18k
Entering edit mode
0

To be honest, I checked that NAs are absent in gene IDs vector, but forgot that numeric values also can be represented as NAs in DESeqoutput$padj vector.

ADD REPLYlink 2.5 years ago
IrK
• 30
Entering edit mode
0

Show us what genes looks like after your faulty code above.

ADD REPLYlink 2.5 years ago
Jeremy Leipzig
18k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0