calculating genomic coverage/ base overlap in R
1
0
Entering edit mode
17 days ago
Xbox_27 • 0

I have a list of regions for different genes. I want to know how much of overlapping is there in terms of basepairs with corresponding genes (Exons). I want to do it in R. How to do it?

genomics • 350 views
ADD COMMENT
0
Entering edit mode
17 days ago
marco.barr ▴ 140

with findOverlaps function in GenomicRanges package or directly with this package GeneOverlap. Check out the documentation of these packages where you find the details.

ADD COMMENT
0
Entering edit mode
# Install and load necessary packages
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("TxDb.Hsapiens.UCSC.hg19.knownGene")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(GenomicRanges)
library(readxl)

# Load the gene annotation database
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

# Read gene information and regions of interest from Excel file
gene_data <- read_excel("coverage.xlsx")

# Convert gene information and regions of interest data to GRanges object
regions_gr <- GRanges(
  seqnames = gene_data$chromosome,
  ranges = IRanges(start = gene_data$region_start, end = gene_data$region_end),
  strand = "*",
  region_id = gene_data$region_id,  # Replace with actual column name for region ID
  gene_id = gene_data$gene_id  # Replace with actual column name for gene ID
)

# Extract exon information
exon_info <- exons(txdb)

# Find overlaps between regions and exons
overlap <- findOverlaps(regions_gr, exon_info)

# Extract the overlapping exons
overlapping_exons <- subjectHits(overlap)

# Calculate the width of overlapping exons
exon_widths <- width(exon_info[overlapping_exons])

# Calculate the coverage for each region
coverage <- width(coverage(overlap, weight = TRUE)) / exon_widths

----this is my code but its showing no overlaps found

ADD REPLY
0
Entering edit mode

The code seems correct at first glance... How did you generate the data in the Excel file? Have you checked the exon annotations? You could check if the exons in the database are of significant length to allow meaningful overlaps with the regions of interest.

ADD REPLY
0
Entering edit mode

enter image description hereHi, this is the list/ excel I am uploading ( coverage.xlsx )

enter image description here

ADD REPLY
0
Entering edit mode

you can dput(head(df) for the both the data frame so that other can test for possible solutions

ADD REPLY

Login before adding your answer.

Traffic: 991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6