Breast cancer TCGA data - DGE analysis
0
2
Entering edit mode
8.9 years ago
David_emir ▴ 490

Hello All,

I am applying Voom normalization to RNA-Seq raw Counts data obtained from TCGA. I have constructed a Matrix of ~20000 Rows and 341 Columns with first column being of Gene_id.

I am using Voom() method to normalise the data. I have done the following code.

## Libraries
library(limma)
library(edgeR)

## Matrix File
raw.data <-read.delim("Combined_matrix_340.txt")
attach(raw.data)
names(raw.data)
d <- raw.data[, 2:341]
rownames(d) <- raw.data[, 1]

# Pheno data file
pheno<-read.table("pheno_data_BRCA.txt", header=TRUE, sep="\t")

##To design matrix---
Group<-factor(pheno$Status,levels=levels(pheno$Status))
design<-model.matrix(~0+Group)

##Normalisation 
y <- voom(d,design,plot=TRUE)

colnames(design)

fit <-lmFit(y,design)

##Designing Contrast Matrix for group Differentiation

cont.wt<-makeContrasts("Metastatic-Normal_Control","ERPositive-Normal_Control","PRPositive-Normal_Control","HER2Positive-Normal_Control","ER_PR_HER2_Neg-Normal_Control",levels=design)

fit2 <-contrasts.fit(fit,cont.wt)
fit3<-eBayes(fit2)

DE<-topTable(fit3, coef=2 )

After this, The output is as follows:

 Gene_ID        logFC       AveExpr        t              P.Value            adj.P.Val        B
ACTB|60       12.59366   12.54151     202.8138  0.000000e+00  0.000000e+00 806.7855
EEF1A1|1915   12.06986 12.51399 187.5779  0.000000e+00  0.000000e+00 781.7838
ACTG1|71      11.93940 12.03115 179.5847  0.000000e+00  0.000000e+00 767.7521
UBC|7316      10.71139 11.15274 176.8877  0.000000e+00  0.000000e+00 761.7751
TPT1|7178     10.99882 11.58788 159.5321  0.000000e+00  0.000000e+00 728.9007
HSP90AB1|3326 11.00446 11.12925 157.1734 9.881313e-323 3.381237e-319 724.0502
FTH1|2495     10.98239 11.26717 153.0514 8.557019e-319 2.509774e-315 715.3888
EEF2|1938     10.82150 11.46502 151.3403 3.886332e-317 9.973786e-314 711.5412
PSAP|5660     10.71044 11.06326 147.8964 9.572234e-314 2.183639e-310 703.8942
HSP90AA1|3320 10.74747 10.94257 144.5401 2.294330e-310 4.710489e-307 696.4700

My Question: I am getting only a list of 10 genes, I am not able to pull all list. And, I want someone to validate my codes and method followed. Let me remind you all, I am a novice in coding/Bioinformatics. Please let me know if I am coding it correct or should I modify it.

Thanks a lot for your help.

-Ateeq Khaliq

RNA-Seq Voom Raw_Count TCGA • 2.9k views
ADD COMMENT
1
Entering edit mode
DE = topTable(fit3, coef = 2, number = 'all')

gives all genes. Default topTable outputs only top ten genes.

ADD REPLY
0
Entering edit mode

Thanks a lot poisonAlien.... Can you please Validate my code?

ADD REPLY
0
Entering edit mode

David, could you please help me and tell me how did you construct the matrix?

ADD REPLY

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6