Biostar Beta. Not for public use.
Retrieve the sequence based on the start and end position in the cuffmerged.gtf
0
Entering edit mode
19 months ago
Chao.wang2 • 40
Canada

Hi guys,

Is there any one who knowns how to retrieve the gene sequence based on the staring and ending position in the cuffmerged.gtf file. Since there are some genes only tracking Ids and starting and ending positions available. I want to retrieve these sequences and annotate it. I will really appreciate for you guys help.

Thanks a lot

RNA-Seq • 1.5k views
ADD COMMENTlink
0
Entering edit mode

Thanks very much

Sounds helpful.

I will try it tomorrow.

ADD REPLYlink
1
Entering edit mode
20 months ago
igor 7.7k
United States

A one-line solution:

bedtools getfasta -fi genome.fa -bed cuffmerged.gtf -fo out.fa

Yes, the -bed parameter can actually take BED/GFF/VCF files. Full documentation here

ADD COMMENTlink
0
Entering edit mode

yes, this is brilliant

ADD REPLYlink
0
Entering edit mode

Hi igor,

Thanks for your solution. However I want to extract the sequence corresponding to one cufflink tracking ID, the bedtools getfasta return several exon sequences for each tracking ID, Do you think there is a way to get around that? I also checked the cufflink website, there is a gffread utility.

which was designed to handle the cufflink output, however it extrac transcript sequences based on transcript ID in the cuffmerged.gtf not gene ID, Do you think there is a way to change it?

Thanks very much

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1