Brand new to going through RNA-seq data, and learning on the go. I have been trying to make a volcano plot of some example data from GEO, to determine up-regulated genes in the sample. I am having problems with getting reasonable outputs for my fold-change values, with insanely high values over 5000 and below -10000. I am not sure if this is a result of just the data or something I have been doing myself. The reason for the volcano plot is so I can then determine the upregulated genes, usually done by looking at the genes over expressed (>=1).
I have been getting the average of the expression data for the "diabetic" and "non-diabetic". To determine the fold change I have then just subtracted the average of the diabetics from the average of the health. This is based off an example with a different dataset.
avg.t2d <- apply(exdat[,t2d.index==" diabetic"], 1, mean)
avg.healthy <- apply(exdat[,t2d.index==" non-diabetic"], 1, mean)
foldch <- avg.t2d - avg.healthy
logt2dp <- -log(t2dpvals, base=10)
plot(foldch, logt2dp, pch=1, xlab="Log2 Fold Change (L/S)",
ylab="-Log10(P-val)", main="Volcano Plot", col = ifelse(adj_pvals<0.05, 'green', 'black'))
Am I missing something big here or am I over thinking the problem?