Extracting certain gene sets from RNA data
1
0
Entering edit mode
8.7 years ago
kevluv93 ▴ 170

I have a list of genes from a transcriptome, their GO IDs, and their annotations. I want to take out all the putative defense related genes from this dataset and put it into another graph for a figure. Does anyone know a good bioinformatics way of taking out genes with certain functions from a dataset that doesn't involve having prior knowledge of all the genes in charge of defense? So if I were to take out genes based on GO IDs, what GO IDs do I need to take out genes related to immune defense? How do you usually approach this problem?

I've been taking out genes related to defense by searching for the GO ID 0006952 (defense response), is that enough?

Thanks for the help!

RNA-seq ontology • 2.6k views
ADD COMMENT
1
Entering edit mode
8.7 years ago
mark.ziemann ★ 1.9k

GSEA is commonly used to perform analysis of gene sets and detect enrichment in the up- or down-regulated ends of the spectrum. It outputs neat enrichment plots such as this.

The gene set database used by GSEA is completely customisable. If you're working with human data, you can download msigDB sets in GMT format and extract all sets with "defence" in the name and make a custom GMT with only those few (six) gene sets. If not, you can extract all genes with the GO IDs of interest and create your own GMT file. Just be sure that the gene names in the GMT exactly match the RNA data you have including capitalisation.

If you only want to retrieve the lines of a spreadsheet corresponding to genes that belong to a gene set of interest, then you could try this bash hack that searches a GMT file for gene sets related to "defense" and grabs them from the spreadsheet

#!/bin/bash

XLS=RNAseq_DGE.xls
GMT=msigdb.v4.0.symbols.gmt
KEYWORD=DEFENSE

for GS in `grep -i $KEYWORD $GMT | cut -f1` ; do
    echo $GS
    grep -w $GS $GMT | cut -f3- | tr '\t' '\n' | grep -wFf - $XLS
done
ADD COMMENT

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6