Duplicate gene symbols converted from UCSC gene id
1
0
Entering edit mode
8.2 years ago
gndy90 • 0

Hi,

I am a beginner in RNASeq. I am running Cufflinks on my human cell transcriptome analysis. Finally, CummeRund gave me the UCSC gene id of differentiallyexpressed genes, not the gene symbols. So I converted the gene id to gene symbols by UCSC Genome Browser. My question is:

  1. I submitted 2900 gene ids, it gave me about 3000 gene_id-gene_symbol pairs. About 100 new gene ids were added. What's the reason for this?

  2. My downstream analysis do not allow for duplicate gene symbols. What should I do about the duplicates? I searched in Biostars, found that the different gene ids corresponding to one common gene symbol are different haplotypes of the gene. Should I just add up the expression values with the same gene symbol?

Thanks.

RNA-Seq gene expression cufflinks ucsc • 3.6k views
ADD COMMENT
0
Entering edit mode

Would it be possible to post an example gene_id with multiple gene_symbols?

ADD REPLY
0
Entering edit mode

Sure. But I think It should be 'gene symbol with multiple gene IDs'.

Here is two examples:

gene id       gene symbol
uc001ajr.3    TNFRSF14
uc001ajt.1    TNFRSF14
uc001aju.3    FAM213B
uc001ajw.2    FAM213B
ADD REPLY
0
Entering edit mode

Could they not be transcript variants (isoforms) of the same gene?

ADD REPLY
0
Entering edit mode

But I did gene level differential expression analysis in cuffdiff and cummeRbund.

Here is my hg19 GTF file format:

chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 12646 12697 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";

No gene symbol within it. Is that correct?

ADD REPLY
0
Entering edit mode
8.2 years ago

For this particular example,the gene_id is same as transcript_id. The gene_id column in mandatory for a gtf format, hence ucsc just added the transcript_id as gene_id. So they are different transcripts of a Gene.

ADD COMMENT
0
Entering edit mode

Solved! I used the incomplete gtf file exported from table browser. Thank you !

ADD REPLY

Login before adding your answer.

Traffic: 2862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6