Counting column values before Fisher's test ends with NULL in R
0
0
Entering edit mode
2.9 years ago
storm1907 ▴ 30

Hello,

I am very new to R, so I apologise if this looks simple to someone.

I try to to join two files and then perform a one-sided Fisher's exact test to determine if there is a greater burden of qualifying variants in casefile or controlfile.

casefile:

GENE  CASE_COUNT_HET  CASE_COUNT_CH   CASE_COUNT_HOM  CASE_TOTAL_AC
ENSG00000124209   1   0   0   1
ENSG00000064703   1   1   0   9
ENSG00000171408   1   0   0   1
ENSG00000110514   1   1   1   12
ENSG00000247077   1   1   1   7

controlfile:

GENE  CASE_COUNT_HET  CASE_COUNT_CH   CASE_COUNT_HOM  CASE_TOTAL_AC
ENSG00000124209   1   0   0   1
ENSG00000064703   1   1   0   9
ENSG00000171408   1   0   0   1
ENSG00000110514   1   1   1   12
ENSG00000247077   1   1   1   7
ENSG00000174776   1   1   0   2
ENSG00000076864   1   0   1   13
ENSG00000086015   1   0   1   25

I have this script:

#!/usr/bin/env Rscript
library("argparse")
suppressPackageStartupMessages(library("argparse"))

parser <- ArgumentParser()
parser$add_argument("--casefile", action="store")
parser$add_argument("--casesize", action="store", type="integer")
parser$add_argument("--controlfile", action="store")
parser$add_argument("--controlsize", action="store", type="integer")
parser$add_argument("--outfile", action="store")

args <- parser$parse_args()

case.dat<-read.delim(args$casefile, header=T, stringsAsFactors=F, sep="\t")
names(case.dat)[1]<-"GENE"
control.dat<-read.delim(args$controlfile, header=T, stringsAsFactors=F, sep="\t")
names(control.dat)[1]<-"GENE"

dat<-merge(case.dat, control.dat, by="GENE", all.x=T, all.y=T)
dat[is.na(dat)]<-0

dat$P_DOM<-0
dat$P_REC<-0

for(i in 1:nrow(dat)){

  #Dominant model
  case_count<-dat[i,]$CASE_COUNT_HET+dat[i,]$CASE_COUNT_HOM
  control_count<-dat[i,]$CONTROL_COUNT_HET+dat[i,]$CONTROL_COUNT_HOM

  if(case_count>args$casesize){
    case_count<-args$casesize
  }else if(case_count<0){
    case_count<-0
   }
  if(control_count>args$controlsize){
    control_count<-args$controlsize
  }else if(control_count<0){
    control_count<-0
   }

  mat<-cbind(c(case_count, (args$casesize-case_count)), c(control_count, (args$controlsize-control_count)))
  dat[i,]$P_DOM<-fisher.test(mat, alternative="greater")$p.value

and problem starts in here:

  case_count<-dat[i,]$CASE_COUNT_HET+dat[i,]$CASE_COUNT_HOM
  control_count<-dat[i,]$CONTROL_COUNT_HET+dat[i,]$CONTROL_COUNT_HOM

the result of case_count and control_count is NULL values, however corresponding columns in both input files are NOT empty. I am confused, which step to take next.

as a newbie in R and statistics, I will be happy about any suggestion. Thank you!

statistics NGS R • 635 views
ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6