Question

Quantile Normalization in R

5

Entering edit mode

6.2 years ago

KVC_bioinfo ▴ 590

Hello All,

I have read counts from RNA seq data in row and columns. I want to quantile normalized them in R. I have following code. This gives me the normalized values. However, the output is a matrix. I want the output with row name and column name so that I can perform PCA on it.

data <- read.csv("data.csv",header=T)
head(data)
data_mat <- as.matrix(data[,-1]) 
head(data_mat)
data_norm <- normalize.quantiles(data_mat, copy = TRUE)

Could someone help me to get that? Thank you in advance.

normalization quantile R Bioconductor • 17k views

ADD COMMENT • link updated 6.2 years ago by Biostar 20 • written 6.2 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

Are you implying that your data_norm object has no row or column names after you perform quantile normalisaton? What about your data.csv file?

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes exactly. data_norm object has no row or column names after I perform quantile normalization. However, data.csv has it.

ADD REPLY • link 6.2 years ago by KVC_bioinfo ▴ 590

score 6 · Accepted Answer · 2018-02-06

6

Entering edit mode

6.2 years ago

Kevin Blighe 87k

Try this (note the extra line; also use data.matrix, not as.matrix):

data <- read.csv("data.csv",header=T)
head(data)
rownames(data) <- data[,1]
data_mat <- data.matrix(data[,-1]) 
head(data_mat)
data_norm <- normalize.quantiles(data_mat, copy = TRUE)

ADD COMMENT • link 6.2 years ago by Kevin Blighe 87k

2

Entering edit mode

It works. Thank you very much.

ADD REPLY • link 6.2 years ago by KVC_bioinfo ▴ 590

1

Entering edit mode

You're the best.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

1

Entering edit mode

Hi Kavin,

I am too having this problem.

data=read.csv("bk.txt", sep="\t", header=T)
head(data)
X adult.endothelial.progenitor.cell alternatively.activated.macrophage
1      ABCG4                              1.17                               1.00
2 AP003391.1                              1.00                               1.00
3      ATP5L                            170.36                             200.45
4      BCL9L                             17.52                               1.74
5  BMPR1APS2                              1.04                               1.05
6     C2CD2L                              4.44                              11.20
rownames(data) <- data[,1]
data_mat <- data.matrix(data[,-1]) 
head(data_mat)
adult.endothelial.progenitor.cell alternatively.activated.macrophage
ABCG4                                   1.17                               1.00
AP003391.1                              1.00                               1.00
ATP5L                                 170.36                             200.45
BCL9L                                  17.52                               1.74
BMPR1APS2                               1.04                               1.05
C2CD2L                                  4.44                              11.20
data_norm <- normalize.quantiles(data_mat, copy = TRUE)
head(data_norm)
 [,1]       [,2]      [,3]       [,4]       [,5]      [,6]       [,7]       [,8]
[1,]   1.316610   1.002034  1.002034   1.006864   1.201017  1.000169   1.316610   1.001017
[2,]   1.003051   1.002034  1.002034   1.006864   1.002034  5.781186   1.002034   1.001017
[3,] 219.738136 219.738136 87.607966 219.738136 219.738136 87.607966 219.738136 219.738136
[4,]  12.947627   1.983136  5.781186   1.201017   4.649492 19.805254   2.767627   5.781186
[5,]   1.201017   1.133051  1.316610   1.006864   1.002034  1.092881   1.002034   1.001017
[6,]   2.767627  25.918475 16.030169   4.649492  25.918475  2.150000  16.030169   2.767627

There is no rows and columns names in the output file. Can you figure out what is wrong with this? Appreciate your help.

ADD REPLY • link 5.6 years ago by bk11 ★ 2.3k

1

Entering edit mode

I see that you have posted here? Quantile Normalization in R and output data

The colnames and rownames of data_norm are the same as data_mat

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, Can you tell me, I have 3 same tissue RNA-seq data and I have the readcounts of every gene from featureCounts and HTseq and Cufflinks. my question is what should be there in my data.csv file ( only the counts or gene list + counts). Thanks in advance.

ADD REPLY • link 6.2 years ago by k.kathirvel93 ▴ 300

0

Entering edit mode

featureCounts and HTseq produce raw counts; Cufflinks would have produced normalised counts, most likely by FPKM.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

1

Entering edit mode

My question is what should be there in my input data.csv file for quantile normalization ( only the counts or gene list + counts). Thanks in advance.

My data.csv looks like :

sample1 sample2 sample3 sample4 sample 5

1000 250000 352 5425 5985

1533 54896 5482 6549 6464

ADD REPLY • link 6.2 years ago by k.kathirvel93 ▴ 300

1

Entering edit mode

It can be any numerical data, usually with samples as columns and genes/probes as rows. If you're attempting to normalise some RNA-seq counts by a standard quantile normalisation function, then I would not do that. You should use one of the published methods like EdgeR, DESeq2, or something else in order to perform the normalisation.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k