How to filter out a list of specific genes from the DESeq object in R - bulk RNA-seq differential expression
1
1
Entering edit mode
3.0 years ago
msimmer92 ▴ 300

I have the following object in a basic DESeq2 bulk RNA-seq differential expression pipeline (human data). It filters out the genes that have low counts but on top of that I would like to remove a couple of genes that I know have issues in my dataset, and want to see how my analysis looks without them. I have the list of such genes in a vector named "genes" and it's encoded as gene symbols (I could transform them to EnsemblIDs if needed).

genesToRemove<- c("gene1","gene2","gene3","gene4","gene5","gene6")

dds <- DESeqDataSetFromHTSeqCount(sampleTable = mytable,directory = directory,design= ~ condition)

dds

class: DESeqDataSet 
dim: 60725 326 
metadata(1): version
assays(1): counts
rownames(60725): ENSG00000278625.1 ... ENSG00000277374.1
rowData names(0):
colnames(326): 9275 9351 ... 10146 10199
colData names(5): Condition Age ...

genes_to_keep <- rowSums(counts(dds)) >= 50
dds2 <- dds[genes_to_keep,]

I would like to do it at this point, after this code, so that then I keep going without them. The problem is that I am not sure how to access the part of the dds2 object where you have the genes in order to filter them out. Any thoughts? Thank you.

R bulk object DESeq2 filter • 5.5k views
ADD COMMENT
1
Entering edit mode

The answer below lists several possible strategies. In general the DESeqDataSet is basically a SummarizedExperiment so all the SE filtering options apply. You can either provide a list of genes by names (rownames) to keep, or a numeric or a logical vector as you would do for most other R data objects. Does that make sense?

https://www.bioconductor.org/packages/devel/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html#subsetting

ADD REPLY
4
Entering edit mode
3.0 years ago

Hi!

In this case you could try this one:

#Obtain the indices of only desired genes
genesToRemove <- which(!rownames(dds) %in% genesToRemove)

#Cut your desired genes in the DESeq object
dds <- dds[genesToRemove, ]

#Verify that undesired genes are removed from DESeq object
genesToRemove %in% rownames(dds)

And the result must be FALSE for every undesired gene.

Best regards!

ADD COMMENT
2
Entering edit mode

Or alternatively use setdiff:

dds[setdiff(rownames(dds), genesToRemove),]
ADD REPLY
1
Entering edit mode

yes, this is what I had in mind but couldn't get it right . Thanks to both! I also didn't know setdiff, this is also a good thing to know. I like that it is more compact.

ADD REPLY

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6