How are tss_id gtf attributes determined? How do they relate to TSS annotations?
2
0
Entering edit mode
7.4 years ago

A GTF file I have downloaded from iGenome contains a "tss_id" attribute.

I suspect this is to match different annotated feature based on the fact that they are relative to a same transcript, i.e. the transcript they relate to come from a same transcription start site (TSS). Am I correct?

How is this attribute determined? Where does it come from? Is there some TSS database somewhere in which these tss_id are used ?

I found data about TSS position in C. elegans from the following source: https://elifesciences.org/lookup/doi/10.7554/eLife.00808.005

Is there an easy way to use this information to improve the gtf annotations I obtained from iGenome (for instance, extending the UTR coordinates)?

RNA-Seq • 5.9k views
ADD COMMENT
2
Entering edit mode
7.4 years ago
Malcolm.Cook ★ 1.5k

You are correct.

tss_id (and p_id, or protein id) are required by the cuffdiff program to perform all the differential splicing/coding contrasts. I have not seen they used anywhere else ever.

It is possible to assign tss_id to a GTF file. All transcripts of a given gene having the same start position of their first exon should be assigned the same tss_id.

Similarly, all transcripts having the same coding sequence (though different UTR) should be assigned the same p_id.

Genomes from iGenomes already have tss_id and p_id assigned that follow these guides. There is no external source of additional information. I have asked Illumina how they assign them and they declined to answer.

For your pleasure, I have developed an Rscript, cuffdiff_gtf_attributes, which does this, and have tested it with Ensembl GTF. The modified GTF files allow me to perform differential isoform and protein detection with cuffdiff.

ADD COMMENT
0
Entering edit mode

Thanks for clarifying what tss_id and p_id correspond to and for the potentially useful script.

You say that a given tss_id refers to a same start position of the first exon. In the particular case of C. elegans, many genes are subject to trans-splicing: the 5' end of the transcript is replaced by an RNA coming from somewhere else (see http://wormbook.org/chapters/www_transsplicingoperons/transsplicingoperons.html). The transcription start site can therefore be upstream of the first exon. I don't know if the gtf files from illumina take the real TSS (when it is known) or the starting position of the first exon when they attribute a tss_id.

ADD REPLY
0
Entering edit mode

I do not think GTF provides specifically for representing trans-splicing at all.

ADD REPLY
0
Entering edit mode
7.4 years ago

This comes from and is only needed by the cufflinks suite of programs. You can take any GTF file without these and add them with cuffcompare. If you don't need to use cufflinks then completely ignore the tss_id.

ADD COMMENT

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6