Question

How To Carry Out Dfr Sliding Window Analysis?

2

Entering edit mode

13.9 years ago

Cheng Zhongshan ▴ 400

Hi, dear members, I want to use DFR sliding window analysis to compute total T scores and differential scores per million base pairs from neighboring genes. Similar sliding window analysis has been used in the article "coupled analysis of gene expression and chromosomal location". In fact, I have asked the author to give me the Perl script that used in the article, but unfortunately, no reply from the author. Would anyone be kind enough to give me some suggestions about how to carry out DFR sliding window analysis? Thanks very much!

analysis • 4.6k views

ADD COMMENT • link 13.3 years ago by Cheng Zhongshan ▴ 400

1

Entering edit mode

please, define DFR

ADD REPLY • link 13.9 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

The tool is available at this URL http://geneexplorer.mc.vanderbilt.edu/digmap This is not a perl script, this is a JAVA program.

ADD REPLY • link 13.9 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

DFR is about differential flag regions by visual inspection or computational method (DFR mapping).Yes, DIGMAP is a JAVA program, but this article also mentioned about perl script. Because this software only can analyze data from human and mouse, it can not provide help for my microarray data, which is come from a plant pathogen Gibberella zeae. So I have to use perl script to do myself. Would you please have a look of my new poster below and give me some suggestions? Thank you very much!

ADD REPLY • link 13.9 years ago by Cheng Zhongshan ▴ 400

score 7 · Answer 1 · 2010-05-19

I'm unsurprised that you have not heard from the author. Assuming that you are referring to this reference, that work is 5 years old. They probably have no idea what happened to their perl scripts.

There are a number of other problems with the paper (aside it being behind a paywall and so only accessible from my workplace).

First, their sliding window spans "5 genes". Well, there is no such thing as a gene: there are only related transcripts. It's unclear how they chose the start and end for their "genes", unless they use some sort of locus coordinate. Bear in mind too that there have been at least 2 major builds of the human genome since 2005, so transcript and probe mappings have altered.

Second, they make a big deal about enabling the "coupled analysis of microarray data with genome location", which I really do not understand. All microarray data relates to genome location, since probesets are mapped to the genome.

It seems that their major goal was to analyse copy number. There are now several tools in the Bioconductor suite which will do that and generate plots very similar to the figures from this paper. Take a look in particular at aCGH, crlmm, VanillaICE and DNAcopy - and browse the list of Bioconductor packages for others.

If you want a sliding window for purposes other than copy number estimation, let us know - it's a relatively simple algorithm and has been discussed here before.

Ram · Answer 2 · 2010-05-22

Right, the problem is contributed by my microarray data. After receive suggestions from the author of SigPathway (R package), he pointed out there are duplicated value in my data. The following is what he send to me:

The problem is with the microarray data you are analyzing with sigPathway. The reason is that some of the microarray values are repeated across your data set:

tmp = as.matrix(Y)
nV = apply(tmp, 1, function(x) {length(unique(x))})
table(nV)
nV
2 3 4 
3 143 13242 
indV = which(nV == 2)
print(indV)
FGSG_11284 FGSG_11470 FGSG_13453 
10615 10783 12751 
print(Y[indV,])
TF134_1_3DAK TF134_2_3DAK WT1_3DAK WT2_3DAK
FGSG_11284 3.630 3.630 3.657 3.657
FGSG_11470 12.445 12.445 12.445 11.673
FGSG_13453 3.433 3.433 3.292 3.292

For the above probes, it is impossible to calculate their corresponding t-statistic (because the denominator, which contains an estimate of the standard deviation, becomes zero). If we were to exclude the above probes, runSigPathway() will run to completion.

Ram · Answer 3 · 2010-05-20

Thanks very much for your kindly reply.

Actually, I want a sliding window for purposes chromosome expression pattern mining. My research microorganism is a plant pathogen, Gibberella zeae, and I first used SAS to divide locus number with 10, 20, 30, or 40 on the fungal chromosome according to their location. I really want to see whether among the continual 10, 20, 30, or 40 locus has some expression pattern that different from rest genes. Because I know sigPathway (R package, pathway analysis with microarray data) can do this kind of job. What I use SAS to do is to subset locus in arbitrary genes numbers, such as 10, 20, 30, 40, or so on, and I hope to use sigPathway to analysis whether these genes chromosome location have effect on its gene expression.

When I use sigpathway to analyze my microarray data, it made my compute out of memory. I have tried the following R codes in several computer, but it always the same, even it computing more than one day, it can not get any results. Would you please point out my problem and give me some suggestions? Thank you very much.

I attach my microarray data and R codes in the attachment, and I hope you can have a look.

#the following code is for annotation list initiation.

setwd("C:/analysis data and codes")
x <- read.table("chr1.txt",header=FALSE,sep="\t")
attach(x)
x$group &lt;- paste(V2,V3,sep="_")
group &lt;- x$group
y &lt;- data.frame(group,V2,V3,V4)
xx &lt;- as.list(group)
xx &lt;- xx[!is.na(xx)]
xx &lt;- unlist(xx)
xxUnique &lt;- unique(xx)
yy &lt;- vector("list",length(xxUnique))
for(i in 1:length(yy))
{
    MT &lt;- "MT_lab"
    yy[[i]] &lt;- list(src=MT,title=xxUnique[i],probes=as.character(y[group==xxUnique[i],]$V4))
}

#the following code is for sigpathway analysis.

library(sigPathway)
YANG &lt;- read.table("All microarray MT_LAB.txt",header=T,sep="t")
attach(YANG)
Y &lt;- data.frame(TF134_1_3DAK,TF134_2_3DAK,WT1_3DAK,WT2_3DAK,row.names=locus_no)
p &lt;- c("1_trt","1_trt","0_norm","0_norm")
statList &lt;- calcTStatFast(Y,p,ngroups=2)
hist(statList$pval,breaks=seq(0,1,0.025),xlab="p-value",ylab="Frequency",main="")
set.seed(1234)
YANG &lt;- runSigPathway(yy,20,500,Y,p,nsim=100,weightType="constant",ngroup=2,npath=25,verbose=F,allpathway=F,alwaysUseRandomPerm=F)
write.table(YANG$df.pathways[1:25,],quote=F,sep="t",file="chr1_sig.txt")
YANG$list.gPS[[1]] 
save.image("chr1_sig")