Recently, CCLE released a miRNA expression data. I was looking for the normalization method of this miRNA expression data but I couldn't find. When I sum up all miRNA's expression values for a sample(cell line), I realized that variation is very high between cell lines like unnormalized data. Does this data need any normalization and do you have any suggestion?
Hi.
I am new to CCLE data, at present it's very valuable information for me. I was struggling for a week to crack miRNA from CCLE.
Could you please provide anything that I could understand and processing.
I didn't understand data type also. What type of data in CCLE like( raw read counts) in that file.
Please help me if you find my message.
Thank you very much
This is my email i.d venugopal887@gmail.com for any time.
From Ghandi et al. (2019) Nature it looks like the miRNAs were measured via Nanostring and normalized using the nSolver software, they don't go into too much detail, but the Methods section states:
Samples were divided
into 14 batches, and two replicates of the K-562 cell line were
included in each batch as a control. Internal positive and negative
controls were used for normalization as recommended by NanoString
using NanoString nSolver software. We excluded samples that failed
NanoString nSolver quality control as well as one sample based on low
positive control signal (normalization coefficient >6) and another
sample based on high background signal (with second ranked negative
control value >80). To estimate the background signal, we sorted the
values for the negative controls within each sample and picked the
second highest value as the background estimate. The median background
estimate across all cell lines was 26.1. We used log(50 + N), in which
N is the nSolver normalized value to reduce the effect of the
background signal in the downstream analyses.
Shawn, thank you for the reply. However, I want to ask this: when I sum up all miRNA expression values for each cell line, I observed 9-fold difference between some cell lines. Do you think it is normal?
That's definitely a bit of a red flag, so I started to dig into the data a bit. In my hands, the extreme data ranges seem to be outliers. I reformatted and read in the miRNA data (each column is a cell line, each row is a miRNA), and I found:
So, while there's a large range of expression, there's only a 3.4-fold difference between the 10% and 90% quantiles. Additionally, if you plot the log of these data as plot(density(log10(cellLines))) it'll generate an approximately normal curve. So it does appear that the large variance is occurring at the extreme ends of the spectrum.
The paper also specified that they normalized to the Nanostring positive and negative controls. If the miRNA panel is like gene expression panels I've analyze then it includes a set of spike in and endogenous standards to normalize for RNA input. Everything seems above board to me.
Hi. I am new to CCLE data, at present it's very valuable information for me. I was struggling for a week to crack miRNA from CCLE. Could you please provide anything that I could understand and processing. I didn't understand data type also. What type of data in CCLE like( raw read counts) in that file.
Please help me if you find my message. Thank you very much This is my email i.d venugopal887@gmail.com for any time.