How to get sequence for a particular locus from .gff3 or .fa file?
2
0
Entering edit mode
7.8 years ago
michael.nagle ▴ 100

I have genomic data from Phytozome in .fa and .gff3 format. My understanding is that this is not the right format for IGV and I can't figure how else to view genomic data on a Mac. I can run Linux or Windows if I need to...

I want to get the sequence for a particular locus from 3 versions of the Eucalyptus grandis genome. Once I get it, I'll have no problem running MUSCLE to show the differences. How do I get the sequence out from these files, ideally on a Mac? Thanks for the help.

genome • 1.8k views
ADD COMMENT
1
Entering edit mode
7.8 years ago
Shicheng Guo ★ 9.4k
bedtools getfasta -fi hg19.fa -bed targetRegion.bed -fo output.fa
ADD COMMENT
0
Entering edit mode
7.8 years ago

GFF3 is an annotation file (or, more accurately, generic genomic features format), not a genomic sequence file. You'll need the reference genome (.fa file) that corresponds to your GFF3. Then, you can use BEDtools getfasta command (via Mac Terminal) to extract the sequence of your desired locus.

ADD COMMENT

Login before adding your answer.

Traffic: 2733 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6