retrieve relative position of CDS start for every isoforme
0
0
Entering edit mode
5.0 years ago

For every isoformes in C. elegans, I would like to get a table that look like that :

transcript          CDS    start
ZK993.1b.1     ZK993.1b       40
ZK993.1b.2     ZK993.1b       20

I tried to parse a gtf file, extract the start positions for each transcript and the position for each codon_start and then measure the distance between the two, but when there is an intron in the 5'UTR it returns the wrong value since the positions are genome-based..

I also tried to get the table with Biomart but there's no way to get a column with the CDS name, only the transcript names.. And also WormMine doesn't seem to be working at all (I get error 400 or server error everytime I try a query..)

Lastly, in a desperate attempt, I tried to map directly CDS sequence on transcript sequence with minimap2, extract the value of the start of the alignment, transcript and CDS names using pysam, remove 'wrong entries' from suboptimal alignments,etc.. but in the end I am still missing 400 records so not ideal (I think it's because the ORF is so small that the mapping didn't work but I didn't look in great details so I might be wrong..).

Anyway, I'm wondering if anyone know some way I could 'easily' get that table ? Maybe it already exist somewhere and I'm just not aware of it.

Thanks a lot !

rna isoforme • 1.1k views
ADD COMMENT
0
Entering edit mode

One approach could be to convert the gtf to bed12 and then the blockStart (exonstarts) are relative to chromStart (transcription start) so directly gives the distance needed (provided strand information is properly taken care of).

ADD REPLY

Login before adding your answer.

Traffic: 3051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6