I am studying some transcript differential expression between Cancer and Adjacent tissues. My sample is organized as follows:
sample type
Barcode_01 01 NT
Barcode_03 02 GC
Barcode_04 02 AD
Barcode_05 03 GC
Barcode_06 03 AD
Barcode_07 04 AD
Barcode_08 04 GC
Barcode_09 05 AD
Barcode_10 05 GC
where GC is gastric tissue and AD the adjacent tissue. I also have one non-cancerous sample that I wish to compare. Thus I need to compare:
AD x GC (where I need to account for in sample variation)
NT x AD
NT x GC
However on loading my data to DESeq, it returns the following error:
raw <- DESeqDataSetFromMatrix(count, sample.data, ~ type + sample)
Erro em DESeqDataSet(se, design = design, ignoreRank) :
the model matrix is not full rank, so the model cannot be fit as specified.
one or more variables or interaction terms in the design formula
are linear combinations of the others and must be removed
Is there some way to organize my data in order to account for in sample variation in my comparison?
I 've tried to perform this analysis on two steps.
First I loaded the data normal using
~ type
design and performAD x NT
compare andGC x NT
.Later I loaded the data excluding the first sample and using the design
~type + sample
and compared AD x NT. is this wrong? or I should use your trick.