I will have to make a fasta file, my assignment is to plot the GC content of chr20 found from UCSC genome browser. I have made a GRange object, but I do not know how to proceed to get the fasta file. My GRange object looks like -
You could use a helper script like the following Perl script to query the indexed FASTA files you made in step 2, against the BED file you exported in step 1:
#!/usr/bin/env perl
use strict;
use warnings;
# fastaDir contains per-chromosome UCSC FASTA (.fa) files and samtools-indexed index (.fai) files
my $fastaDir = "/foo/bar/baz";
while (<STDIN>) {
chomp;
my ($chr, $start, $stop, $id, $score, $strand) = split("\t", $_);
my $queryKey = "$chr:$start-$stop";
my $queryResult = `samtools faidx $fastaDir/$chr.fa $queryKey`; chomp $queryResult;
my @lines = split("\n", $queryResult);
my @seqs = @lines[1..(scalar @lines - 1)];
my $seq = join("", @seqs);
if ($strand eq "-") { $seq = revdnacomp($seq); }
my $header = ">".join(":",($chr, $start, $stop, $id, $score, $strand));
print STDOUT $header."\n".uc($seq)."\n";
}
sub revdnacomp {
my $dna = shift @_;
my $revcomp = reverse($dna);
$revcomp =~ tr/ACGTacgt/TGCAtgca/;
return $revcomp;
}
This converts six-column BED to FASTA, but it should work with the three-column BED data coming out of your GRange object.
You might run this something like so:
$ bed2fasta.pl < granges.bed > granges.fa
The file granges.fa is the FASTA file for genomic intervals defined in your GRanges object. You can then use Perl or Python (or any text parser) to analyze GC content.