Hi

I am trying to do permutation test for comparing number of LoF mutations in cancer patients.

Haven't done this before based on gene size and GC content. Do you have any statistical tutorial for this? or code?

Thanks

Permutation test with gene size and GC contents

1

Entering edit mode

Hi

I am trying to do permutation test for comparing number of LoF mutations in cancer patients.

Haven't done this before based on gene size and GC content. Do you have any statistical tutorial for this? or code?

Thanks

9

Entering edit mode

In R you can use the package regioneR available in Bioconductor. It takes care of the randomization process and the statistics and you can customize it to evaluate different features.

If I understand correctly your question, you want to test if the proportion of LoF variants you have in you sample is higher that what you would expect by chance, is'nt it? If so, you could try with something like this: create a number of random variant sets based on yours, evaluate how many LoF variant you detect, see if the original set has more LoF variants than expected.

I would do something like this:

1 - Implement a function counting the number of LoF variants given a GRanges with the reference and alternative alleles. You can use the predictCoding function in VariantAnnotation, for example, with this signature:

```
numberOfLoF <- function(A, ...) {
varAllele <- DNAStringSet(as.character(A$alt))
mcols(A) <- NULL
cod <- predictCoding(query=A, subject=txdb, seqSource=Hsapiens, varAllele=varAllele)
#Note1: the same variant can be counted more than once if it affects more than one transcript
#Note2: What you consider a LoF can be different than that
numLoF <- length(which(cod$CONSEQUENCE %in% c("nonsynonymous", "frameshift", "nonsense")))
return(numLoF)
}
```

2 - Implement a randomization that places the variants randomly over the genome and assigns the same nucleotide changes you see in your variants:

```
randomizeVariants <- function(A, ...) {
randA <- randomizeRegions(A, ...)
mcols(randA) <- mcols(A)
return(randA)
}
```

3 - Create a mask to tell the randomization function where the random variants can be placed. If this is from an exome sequencing experiment, I would mask out any part of the genome not targeted by the exome.

```
exome.targets <- toGRanges("path/to/exome.bed")
whole.genome <- filterChromosomes(getGenome(genome="hg19"), chr.type="canonical")
mask <- subtractRegions(whole.genome, exome.targets)
```

4 - And finally, perform a permutation test with all of this

```
library(regioneR)
library(VariantAnnotation)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
#Define here the functions and mask above
#Permutation test
pt <- permTest(A=vars, randomize.function=randomizeVariants, evaluate.function=numberOfLoF,
mask=mask, genome="hg19", ntimes=50)
print(pt)
plot(pt)
```

This may take some time to run, depending on the number of variants you have and the ntimes, but it will perform the whole permutation test and give you a p-value, a z-score and a plot of the results of the permutation tests.

You can find more information about regioneR in the package vignette.

Loading Similar Posts

Omicj, I can help you, however there some unclear things about what you want to do. What is your input? Mutations - are they SNPs (bed file with positions)?

Also, see this answer on "how to do permutations" by Aaronquinlan - https://www.biostars.org/p/48617/#48656