Question: Ask for suggestions of filtering probes in affymetrix data
0
5 weeks ago
liux.bio • 320
China

Hi, biostars.

I am analyzing an affymetrix microarray dataset. After searching, I found following blogs about filtering probes:

Following the blogs, I determined the filtering standard

  • For each sample, compute 95th percentile of the negative control genes (normgene -> intro)
  • For a gene in a sample, if the gene's value is greater than the 95th percentile of the negative control genes in this sampe, we assume this gene is detected
  • if a gene is detected in more than half of all the samples, select this gene, otherwise, filter it.

After filtering, I obtain 6543 probes (the number of all probes is 53617). I want to know if the filtering standard is reasonable. Please give me some suggestions.

Here is my code:

library(oligo)
# load data 
celFiles <- list.celfiles("./rawData/GSE83452/celFiles/", full.names = TRUE)
rawData <- read.celfiles(celFiles)
# normalize 
processedData <- rma(rawData)

# filter genes with low expression
librarypd.hugene.2.0.st)
con <- dbpd.hugene.2.0.st)
# negative-control probes (normgene -> intro)
NCprobes <- dbGetQuery(con,  "select meta_fsetid from core_mps inner join featureSet on 
 core_mps.fsetid=featureSet.fsetid where featureSet.type='10';")
#  the 95th percentile of the negative control genes 
NCquantileValue <- apply(exprs(processedData)[as.character(NCprobes[,1]),], 2, quantile, 
 probs=0.95)

# function to filter genes
filterGeneDetected <- function(sampleValues, quantileValues, detectedRatio) {
    # the length of sampleValues and that of quantileValues should be the same
    # In a sample, if a gene's is up quantileValue, we say this gene is detected
    # If a gene is detected in more than detectedRatio * allSampels, the gene is selected
    samplesLength <- length(sampleValues)
    detectedNum <- 0
    for (index in c(1:samplesLength)) {
       if (sampleValues[index] > quantileValues[index]) {
           detectedNum =  detectedNum + 1
        }  
        }
    ifSelected <- FALSE
    if (detectedNum/samplesLength >= detectedRatio) {
        ifSelected = TRUE
     }
     return (ifSelected)
   } 

detectedProbes <- apply(exprs(processedData), 1, 
                    filterGeneDetected, 
                    quantileValues = NCquantileValue,
                    detectedRatio = 0.5)
processedDataFilter <- processedData[detectedProbes, ]

Many thanks for your kindness.

ADD COMMENTlink 5 weeks ago liux.bio • 320

Login before adding your answer.

Powered by the version 1.4