How To Carry Out Dfr Sliding Window Analysis?
3
2
Entering edit mode
13.9 years ago

Hi, dear members, I want to use DFR sliding window analysis to compute total T scores and differential scores per million base pairs from neighboring genes. Similar sliding window analysis has been used in the article "coupled analysis of gene expression and chromosomal location". In fact, I have asked the author to give me the Perl script that used in the article, but unfortunately, no reply from the author. Would anyone be kind enough to give me some suggestions about how to carry out DFR sliding window analysis? Thanks very much!

analysis • 4.6k views
ADD COMMENT
1
Entering edit mode

please, define DFR

ADD REPLY
1
Entering edit mode

The tool is available at this URL http://geneexplorer.mc.vanderbilt.edu/digmap This is not a perl script, this is a JAVA program.

ADD REPLY
0
Entering edit mode

DFR is about differential flag regions by visual inspection or computational method (DFR mapping).Yes, DIGMAP is a JAVA program, but this article also mentioned about perl script. Because this software only can analyze data from human and mouse, it can not provide help for my microarray data, which is come from a plant pathogen Gibberella zeae. So I have to use perl script to do myself. Would you please have a look of my new poster below and give me some suggestions? Thank you very much!

ADD REPLY
7
Entering edit mode
13.9 years ago
Neilfws 49k

I'm unsurprised that you have not heard from the author. Assuming that you are referring to this reference, that work is 5 years old. They probably have no idea what happened to their perl scripts.

There are a number of other problems with the paper (aside it being behind a paywall and so only accessible from my workplace).

First, their sliding window spans "5 genes". Well, there is no such thing as a gene: there are only related transcripts. It's unclear how they chose the start and end for their "genes", unless they use some sort of locus coordinate. Bear in mind too that there have been at least 2 major builds of the human genome since 2005, so transcript and probe mappings have altered.

Second, they make a big deal about enabling the "coupled analysis of microarray data with genome location", which I really do not understand. All microarray data relates to genome location, since probesets are mapped to the genome.

It seems that their major goal was to analyse copy number. There are now several tools in the Bioconductor suite which will do that and generate plots very similar to the figures from this paper. Take a look in particular at aCGH, crlmm, VanillaICE and DNAcopy - and browse the list of Bioconductor packages for others.

If you want a sliding window for purposes other than copy number estimation, let us know - it's a relatively simple algorithm and has been discussed here before.

ADD COMMENT
0
Entering edit mode

Thanks very much for your kindly reply. Would you please help my again, please see my new poster below. Thanks very much!

ADD REPLY
0
Entering edit mode

Would you please read my new poster below, thanks very much!

ADD REPLY
1
Entering edit mode
13.9 years ago

Right, the problem is contributed by my microarray data. After receive suggestions from the author of SigPathway (R package), he pointed out there are duplicated value in my data. The following is what he send to me:

The problem is with the microarray data you are analyzing with sigPathway. The reason is that some of the microarray values are repeated across your data set:

tmp = as.matrix(Y)
nV = apply(tmp, 1, function(x) {length(unique(x))})
table(nV)
nV
2 3 4 
3 143 13242 
indV = which(nV == 2)
print(indV)
FGSG_11284 FGSG_11470 FGSG_13453 
10615 10783 12751 
print(Y[indV,])
TF134_1_3DAK TF134_2_3DAK WT1_3DAK WT2_3DAK
FGSG_11284 3.630 3.630 3.657 3.657
FGSG_11470 12.445 12.445 12.445 11.673
FGSG_13453 3.433 3.433 3.292 3.292

For the above probes, it is impossible to calculate their corresponding t-statistic (because the denominator, which contains an estimate of the standard deviation, becomes zero). If we were to exclude the above probes, runSigPathway() will run to completion.

ADD COMMENT
0
Entering edit mode
13.9 years ago

Thanks very much for your kindly reply.

Actually, I want a sliding window for purposes chromosome expression pattern mining. My research microorganism is a plant pathogen, Gibberella zeae, and I first used SAS to divide locus number with 10, 20, 30, or 40 on the fungal chromosome according to their location. I really want to see whether among the continual 10, 20, 30, or 40 locus has some expression pattern that different from rest genes. Because I know sigPathway (R package, pathway analysis with microarray data) can do this kind of job. What I use SAS to do is to subset locus in arbitrary genes numbers, such as 10, 20, 30, 40, or so on, and I hope to use sigPathway to analysis whether these genes chromosome location have effect on its gene expression.

When I use sigpathway to analyze my microarray data, it made my compute out of memory. I have tried the following R codes in several computer, but it always the same, even it computing more than one day, it can not get any results. Would you please point out my problem and give me some suggestions? Thank you very much.

I attach my microarray data and R codes in the attachment, and I hope you can have a look.

#the following code is for annotation list initiation.

setwd("C:/analysis data and codes")
x <- read.table("chr1.txt",header=FALSE,sep="\t")
attach(x)
x$group &lt;- paste(V2,V3,sep="_")
group &lt;- x$group
y &lt;- data.frame(group,V2,V3,V4)
xx &lt;- as.list(group)
xx &lt;- xx[!is.na(xx)]
xx &lt;- unlist(xx)
xxUnique &lt;- unique(xx)
yy &lt;- vector("list",length(xxUnique))
for(i in 1:length(yy))
{
    MT &lt;- "MT_lab"
    yy[[i]] &lt;- list(src=MT,title=xxUnique[i],probes=as.character(y[group==xxUnique[i],]$V4))
}

#the following code is for sigpathway analysis.

library(sigPathway)
YANG &lt;- read.table("All microarray MT_LAB.txt",header=T,sep="t")
attach(YANG)
Y &lt;- data.frame(TF134_1_3DAK,TF134_2_3DAK,WT1_3DAK,WT2_3DAK,row.names=locus_no)
p &lt;- c("1_trt","1_trt","0_norm","0_norm")
statList &lt;- calcTStatFast(Y,p,ngroups=2)
hist(statList$pval,breaks=seq(0,1,0.025),xlab="p-value",ylab="Frequency",main="")
set.seed(1234)
YANG &lt;- runSigPathway(yy,20,500,Y,p,nsim=100,weightType="constant",ngroup=2,npath=25,verbose=F,allpathway=F,alwaysUseRandomPerm=F)
write.table(YANG$df.pathways[1:25,],quote=F,sep="t",file="chr1_sig.txt")
YANG$list.gPS[[1]] 
save.image("chr1_sig")
ADD COMMENT
0
Entering edit mode

It's difficult to diagnose this code, since I don't know what the data look like (chr1.txt, All microarray MT_LAB.txt) and I'm not familiar with the sigPathway library. I assume the memory issue is with runSigPathway() - I don't see anything else obvious that would cause a problem.

How much RAM do you have? Is there any documentation as to what the system requirements are for the R package? Do you have access to other more powerful machines? This seems to be a Windows machine, which never helps with use of resources in my (biased) experience.

ADD REPLY
0
Entering edit mode

Thanks again. Yes, it is difficult to diagnose from my code. In fact, when my chr1.txt is not too big, the code works well, but when it become more than 100000 lines, R crashed in my XP system or another Vista system. Actually, the two computers I used are all about 2G DDR and 200G disk. Can you leave you email box here, I will send my microarray data file and annotation list to yours.

ADD REPLY
0
Entering edit mode

Probably best if you upload sample files to a web location (e.g. a public Dropbox folder, or Google Docs) and share the URL here, so anyone can take a look.

ADD REPLY

Login before adding your answer.

Traffic: 2481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6