genome features and sequence parsing
1
0
Entering edit mode
7.0 years ago
lessismore ★ 1.3k

Hello people, need to parse the cds fraction of a genome based on a gff3 file and a genome file. Do you know any good parser for that? For the moment i am with:

cat mygenome.gff3 | awk -v FS="\t" -v OFS="\t" '$3 == "CDS" {print $1, $4-1 ,$5, $1":"$4"-"$5":"$9}' | bedtools getfasta -name -fi mygenome.fasta -bed - -fo cds.fa

please note: in this annotation exons are identified as cds1/cds2/cds3 etc..

At this point i just got all cds for each transcript. But my aim is: for each transcript parse the complete CDS sequence after joining all the cds1 cds2 cds3 etc.. also based on strand orientation.

In summary i want a table like this: chrom | coord. cds | seq (CDS)

do you have any clues for me? thanks

genome sequence cds parsing • 1.6k views
ADD COMMENT
1
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

thanks, can you also tell me how you printed the pic of this page that you just posted?

ADD REPLY
1
Entering edit mode

Directly next to the button for code markup is a button for inserting images. You need to put the picture online somewhere, I use tinypic but there are many alternatives.

ADD REPLY
0
Entering edit mode
7.0 years ago

You're looking for getAnnoFasta.pl from the Augustus programs:

http://bioinf.uni-greifswald.de/augustus/binaries/scripts/getAnnoFasta.pl

ADD COMMENT

Login before adding your answer.

Traffic: 2714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6