Goseq: Pwf: Error In If (Hi <= Low) { : Missing Value Where True/False Needed
4
0
Entering edit mode
10.2 years ago
liupfskygre ▴ 210

Hi, all

Dose anyone familiar with the use of goseq package here, I always get quick reply here, so I post my questions here, hope for some suggestions.

I am trying to use goseq for RNAseq data analysis of a non-native bacteria, I fetch the genelength data and GO term data by using ensembl Perl API. Differential expression was done by DESeq2.

After data run pwf of goseq, error occurred as following:

###

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

It seems to be the problem of the names or ID problem, but I could not solved it till now.

Have any one met this before? How could I solved this problem?

Thanks!

My workflow were as following

##   import the genes ID which I fetch from Ensembl

HZ254gene_data <- read.delim("E:/HZ_GO_enrichment/HZ254gene_data.txt", quote="")

colnames(HZ254gene_data)[1]="locus_tag"

##  get assayed gene vector

assayed.genes=HZ254gene_data$locus_tag

assayed.genes1=as.vector(assayed.genes)

is.vector(assayed.genes1)

[1] TRUE

head (assayed.genes1)

[1] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" "Mtc_0006"

str(assayed.genes1)

chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" ...


## get gene length data for pwf bias.data

length=HZ254gene_data$length

is.vector(length)

[1] TRUE

names(length)=assayed.genes1

head(length)

Mtc_0001 Mtc_0002 Mtc_0003 Mtc_0004 Mtc_0005 Mtc_0006 
     768      654     1899     1770      351      654

#
str(length)

 Named int [1:2512] 768 654 1899 1770 351 654 903 72 966 1071 ...
 - attr(*, "names")= chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" ...



##import results from DESeq2 and get the DEgenes 

HZ254DifList_Mer <- read.delim("E:/HZ_GO_enrichment/HZ254DifList_Mer.txt")

de.genes1=as.integer(HZ254DifList_Mer$padj<0.05)

##get the genes for pwf 

gene.vector=as.integer(de.genes1)

names(gene.vector)=assayed.genes1

head(gene.vector)

##import the GO_gene association table from ensembl API 

HZ254gene_go <- read.delim("E:/HZ_GO_enrichment/HZ254gene_go.txt", quote="")

go.terms=HZ254gene_go


## run pwf, 

pwf=nullp(gene.vector,bias.data=length)

##got errors

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

sessionInfo()

R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] goseq_1.14.0            geneLenDataBase_0.99.12 BiasedUrn_1.06.1       
[4] BiocInstaller_1.12.0   

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.24.0   Biobase_2.22.0         BiocGenerics_0.8.0    
 [4] biomaRt_2.18.0         Biostrings_2.30.1      bitops_1.0-6          
 [7] BSgenome_1.30.0        DBI_0.2-7              GenomicFeatures_1.14.2
[10] GenomicRanges_1.14.4   grid_3.0.2             IRanges_1.20.6        
[13] lattice_0.20-24        Matrix_1.1-2           mgcv_1.7-28           
[16] nlme_3.1-113           parallel_3.0.2         RCurl_1.95-4.1        
[19] Rsamtools_1.14.3       RSQLite_0.11.4         rtracklayer_1.22.3    
[22] stats4_3.0.2           tools_3.0.2            XML_3.98-1.1          
[25] XVector_0.2.0          zlibbioc_1.8.0
r data format • 17k views
ADD COMMENT
1
Entering edit mode

Hi, liupfskygre, It's hard to know exactly what the issue is without a reproducible example, but perhaps the assignment of values to variable names, which are also base R functions, could be a problem. Try replacing length=HZ254gene_data$length with lgth=HZ254gene_data$length. If the nullp function calls length (the function), your use of this name as a variable might be a conflict.

ADD REPLY
0
Entering edit mode

thanks for your reminding! I recheck the data and found where the problems is, there were some genes in the dataset that do not have expression data, which were assigned as NA!

ADD REPLY
0
Entering edit mode
10.1 years ago
Michael 54k

To not leave this question appear unanswered:

The first thing you have do after seeing such an error is to call traceback() and add the output to the error report.

Error in if (hi <= low) { : missing value where TRUE/FALSE needed

means, as it says, there is a missing value where if requires a TRUE/FALSE value, <= and other comparison operators in R yield 'NA' if and only if at least one of the values is missing itself. In conclusion, your input data contains missing values which goseq does not handle.

In case that happens, missing values need to be removed (e.g. by using complete.cases) or imputed, see http://www.statmethods.net/input/missingdata.html for a quick tutorial.

ADD COMMENT
0
Entering edit mode

Thanks for your explanations!

ADD REPLY
0
Entering edit mode
10.1 years ago
de.genes1=as.integer(HZ254DifList_Mer$padj<0.05)`

This looks like a logical vector casted into an integer - can you print de.genes1 out

ADD COMMENT
0
Entering edit mode

thanks, as for my case, missing data "NA" in the de.genes1 caused the problem! At first I tried to convert de.gene1 into numeric, but failed, recheck the data found NA existed.

ADD REPLY
0
Entering edit mode
9.9 years ago
keith.hughitt ▴ 280

In case anyone else runs into this problem, one way you can end up with the same error is if the gene lengths are all the same.

ADD COMMENT
0
Entering edit mode
6.6 years ago
IrK ▴ 70

I had the same error message and in my case it a binary vector had NAs, so basically:

> genes = as.integer(DESeqoutput$padj <= 0.05 & > !is.na(DESeqoutput$padj))   # Give all values less than 0.05 and no NAs

However, when I call for nullp() I get following warning:

Can't find mm10/ensGene length data in genLenDataBase...
Found the annotaion package, TxDb.Mmusculus.UCSC.mm10.ensGene
Trying to get the gene lengths from it.
Warning messages:
1: In library() : library ‘/usr/lib/R/site-library’ contains no packages
2: In pcls(G) : initial point very close to some inequality constraints

It is warning message but I would like to interpret it. What does it mean? Thanks

ADD COMMENT
1
Entering edit mode

that R is gibberish. What do you think the gene vector should look like?

ADD REPLY
0
Entering edit mode

To be honest, I checked that NAs are absent in gene IDs vector, but forgot that numeric values also can be represented as NAs in DESeqoutput$padj vector.

ADD REPLY
0
Entering edit mode

Show us what genes looks like after your faulty code above.

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6