Permutation test with gene size and GC contents
1
1
Entering edit mode
8.7 years ago
omicj ▴ 70

Hi

I am trying to do permutation test for comparing number of LoF mutations in cancer patients.

Haven't done this before based on gene size and GC content. Do you have any statistical tutorial for this? or code?

Thanks

statistics gene-size permutation-test • 3.1k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Sarah, what does factorbook/jaspar have to do with permutations for mutation enrichment?

ADD REPLY
0
Entering edit mode

factorbook does with transcription factor dynamic

ADD REPLY
0
Entering edit mode

Omicj, I can help you, however there some unclear things about what you want to do. What is your input? Mutations - are they SNPs (bed file with positions)?

Also, see this answer on "how to do permutations" by Aaronquinlan - Picking Random Genomic Positions

ADD REPLY
9
Entering edit mode
8.7 years ago
bernatgel ★ 3.4k

In R you can use the package regioneR available in Bioconductor. It takes care of the randomization process and the statistics and you can customize it to evaluate different features.

If I understand correctly your question, you want to test if the proportion of LoF variants you have in you sample is higher that what you would expect by chance, is'nt it? If so, you could try with something like this: create a number of random variant sets based on yours, evaluate how many LoF variant you detect, see if the original set has more LoF variants than expected.

I would do something like this:

1 - Implement a function counting the number of LoF variants given a GRanges with the reference and alternative alleles. You can use the predictCoding function in VariantAnnotation, for example, with this signature:

numberOfLoF <- function(A, ...) {
  varAllele <- DNAStringSet(as.character(A$alt))
  mcols(A) <- NULL 
  cod <- predictCoding(query=A, subject=txdb, seqSource=Hsapiens, varAllele=varAllele)  

  #Note1: the same variant can be counted more than once if it affects more than one transcript
  #Note2: What you consider a LoF can be different than that
  numLoF <- length(which(cod$CONSEQUENCE %in% c("nonsynonymous", "frameshift", "nonsense")))

  return(numLoF)  
}

2 - Implement a randomization that places the variants randomly over the genome and assigns the same nucleotide changes you see in your variants:

randomizeVariants <- function(A, ...) {
  randA <- randomizeRegions(A, ...)
  mcols(randA) <- mcols(A)
  return(randA)
}

3 - Create a mask to tell the randomization function where the random variants can be placed. If this is from an exome sequencing experiment, I would mask out any part of the genome not targeted by the exome.

exome.targets <- toGRanges("path/to/exome.bed")
whole.genome <- filterChromosomes(getGenome(genome="hg19"), chr.type="canonical")
mask <- subtractRegions(whole.genome, exome.targets)

4 - And finally, perform a permutation test with all of this

library(regioneR)
library(VariantAnnotation)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

#Define here the functions and mask above

#Permutation test

pt <- permTest(A=vars, randomize.function=randomizeVariants, evaluate.function=numberOfLoF,
         mask=mask, genome="hg19", ntimes=50)
print(pt)
plot(pt)

This may take some time to run, depending on the number of variants you have and the ntimes, but it will perform the whole permutation test and give you a p-value, a z-score and a plot of the results of the permutation tests.

You can find more information about regioneR in the package vignette.

ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6