Distance Between Genes Of Interest
1
0
Entering edit mode
10.6 years ago
anuragm ▴ 130

I have the UCSC RefSeq track with txstart sites for all the genes. I want to find the distance between consecutive pairs of genes in order to determine which consecutive genes lie within a particular range of each other. How can I do this using Python or R ?

gene ucsc • 4.6k views
ADD COMMENT
3
Entering edit mode
10.6 years ago

You could probably do this relatively easily in R with GenomicRanges. Read in the RefSeq track, convert the transcripts to GRanges and then apply a function that computes the distance between a range and the output from nearest().

Edit: Here's an example:

library("GenomicRanges")
refGene <- read.delim("~/Downloads/refGene.txt", header=T)
gr <- GRanges(seqnames=Rle(refGene$chrom), 
    ranges=IRanges(start=refGene$txStart, end=refGene$txEnd, names=refGene$name),
    strand=Rle(strand(refGene$strand))) 
neighbors <- nearest(gr) #This can return NA
REMOVE <- whichis.na(neighbors))
neighbors <- neighbors[-REMOVE]
neighbor <- gr[neighbors]
gr <- gr[-REMOVE]
distances <- distance(gr, neighbor)

I just tried this on my laptop and it seems to work fine. If this isn't exactly what you want, you should be able to easily modify it.

ADD COMMENT

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6