CCLE, TCGA, & GTEx have provided RNAseq data but there was very little information available regarding whether batch effects correction is needed when data from different datasets are combined or compared.
I found only one publication that addresses the batch effect, as if batch effects per se is not a big issue.
Q1. Across TCGA tumor types, is there a need to normalize all data before comparing the expression of gene of interest?
Q2. How big an issue is the batch effect between different databases, especially as they seem to be processed by slightly different pipelines (in addition to other technical factors during sample procurement and handling).
Any suggestions or references regarding the issue (or non-issue) of batch effects among large datasets contributed by multiple institutes