Calculating GC content around a specific genomic location
2
0
Entering edit mode
3.4 years ago
hkarakurt ▴ 180

Hello everyone, I am looking for a method that calculates GC content of a specific genomic region. I could not find an automated Python (or R) library that calculates GC content around a location.

For example; I want the GC contect of + and - 20bp around Chr3 16500 location. Is it possible to do it?

Thank you in advance.

WGS WES GC genome SNP • 1.8k views
ADD COMMENT
0
Entering edit mode

Hi,

You can use Biopython to import a sequence, to subset a sequence to the region of interest and then just apply the function GC(), like appears explained in the documentation:

>>> from Bio.Seq import Seq
>>> from Bio.SeqUtils import GC
>>> my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")
>>> GC(my_seq)
46.875

I hope this helps,

António

ADD REPLY
2
Entering edit mode
3.4 years ago
samtools faidx indexed_ref.fasta `echo "chr3:16500" | awk -F: '{P=int($2);X=20;printf("%s:%d-%d",$1,(P-X<1?1:P-X),P+X);}'` | tail -n+2 | awk '{T+=length($0);gsub(/[AaTtWwNn]/,"");N+=length($0);} END{print N/T;}'
ADD COMMENT
2
Entering edit mode
3.4 years ago

R version:

library(Biostrings)
library(GenomicFeatures)

## load fasta
dna <- readDNAStringSet('your_genome.fasta')

## find your relevant chromosome
grep('chr3',names(z.dna),value=T)

## if it is present subset by that chr and rename for convenience
dna.chr3 <- dna[grep('chr3',names(dna))]
names(dna.chr3) <- 'chr3'

## setup your genomic coordinates
gr <- GRanges('chr3:16500')

## flank your location
flank.gr <- flank(gr,20,both=TRUE)

## extract sequence
flank.seq <- getSeq(dna.chr3, flank.gr)
## if you wanted to get ONLY the 20 bp flanking sites - not including the internal section
## flank.gr <- setdiffflank.gr,gr)

## get AGCTN+ frequency
flank.alphafreq <- alphabetFrequency(flank.seq)

## get G/C content as a fraction
sum(flank.alphafreq[,c('G','C')])/sum(flank.alphafreq[,c('A','G','C','T')])
ADD COMMENT

Login before adding your answer.

Traffic: 2583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6