Filter out data for coding genes
0
0
Entering edit mode
5.7 years ago

Hello all

I have obtained RNA-seq data from TCGA. The data contains about 56,000 genes (gene symbols and ensemble ids) that contains pseudo, uncharacterized, non-coding and coding genes. I am looking for the efficient way to filter out the data for only coding genes. Can anyone suggest?

Regards Shivangi Agarwal

coding RNA-Seq • 2.1k views
ADD COMMENT
0
Entering edit mode

In what format do you have the data?

ADD REPLY
0
Entering edit mode

I have patient'swise data which contains expression value (FPKM) for each of the gene in each of the patient. (FPKM.txt files)

ADD REPLY
3
Entering edit mode

Then I would get a GTF file for human of the correct assembly (hg19, hg38, don't know what TCGA used, e.g. from GENCODE), and then filter for genes that are coding (CDS in the GTF). From this, extract the gene names and use these to subset your data. This all can be done with Unix tools like awk.

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6