How to snps map gene in R (without R packages)
2
1
Entering edit mode
5.1 years ago
mmh7272 ▴ 60

Hi I am a beginner bioinformatic i am searching "how to snp mapping gene?" there are so many R packages for example (i.e, biomaRT, SNPlist .. ) but i have a snplist and gene list I wanna SNPs mapping gene list using R so i wanna get result file

Snp list:

Chr POS 
1 2
1 3
2 4
2 7
3 5
3 13
4 13

-gene list-

Chr start end gene id
1 2 3 a
2 3 5 b
2 6 10 c
3 4 10 d
3 11 17 e
4 14 17 f

I wanna get like below result

Chr POS gene id
1 2 a
1 3 a
2 4 b
2 7 c
3 5 d
3 13 e
4 13 Na
R SNP gene • 2.6k views
ADD COMMENT
4
Entering edit mode
5.1 years ago

Hey, on forums like this, it is usually a good idea to provide code that can produce your input data. Then, people will have less time to spend in tackling your problem.

To do what you want, you should use GenomicRanges:

create your data

SNPs <- data.frame(
  Chr = c(1,1,2,2,3,3,4),
  POS = c(2,3,4,7,5,13,13))
SNPs
  Chr POS
1   1   2
2   1   3
3   2   4
4   2   7
5   3   5
6   3  13
7   4  13

genes <- data.frame(
  Chr = c(1,2,2,3,3,4),
  start = c(2,3,6,4,11,14),
  end = c(3,5,10,10,17,17),
  id = c('a','b','c','d','e','f'))
genes
  Chr start end id
1   1     2   3  a
2   2     3   5  b
3   2     6  10  c
4   3     4  10  d
5   3    11  17  e
6   4    14  17  f

convert the data to GRanges objects

require(GenomicRanges)
grSNPs <- makeGRangesFromDataFrame(
  SNPs,
  seqnames.field = 'Chr',
  start.field = 'POS',
  end.field = 'POS',
  keep.extra.columns = TRUE)
grgenes <- makeGRangesFromDataFrame(
  genes,
  seqnames.field = 'Chr',
  start.field = 'start',
  end.field = 'end',
  keep.extra.columns = TRUE)

find overlaps and save the indices that overlapped

qHits <- queryHits(findOverlaps(query = grSNPs, subject = grgenes, type = 'within'))
subHits <- subjectHits(findOverlaps(query = grSNPs, subject = grgenes, type = 'within'))

use indices to produce the original data for the overlaps

overlaps <- data.frame(SNPs[qHits,], genes[subHits,'id'])
colnames(overlaps) <- c('Chr','POS','id')

now find those that didn't overlap

missing <- SNPs[-qHits,]
nmissing <- nrow(missing)
missing <- data.frame(missing, rep(NA, nmissing))
colnames(missing) <- c('Chr','POS','id')

rbind overlapping regions and non-overlapping

finalresult <- rbind(overlaps, missing)
finalresult
  Chr POS   id
1   1   2    a
2   1   3    a
3   2   4    b
4   2   7    c
5   3   5    d
6   3  13    e
7   4  13 <NA>
ADD COMMENT
3
Entering edit mode
5.1 years ago
zx8754 11k

Using data.table:

# example data (partly stolen from Kevin's answer)
SNPs <- data.table(
  Chr = c(1,1,2,2,3,3,4),
  start = c(2,3,4,7,5,13,13),
  end = c(2,3,4,7,5,13,13),
  key = c("Chr", "start", "end"))

genes <- data.table(
  Chr = c(1,2,2,3,3,4),
  start = c(2,3,6,4,11,14),
  end = c(3,5,10,10,17,17),
  id = c('a','b','c','d','e','f'),
  key = c("Chr", "start", "end"))

library(data.table)

foverlaps(SNPs, genes)
#    Chr start end   id i.start i.end
# 1:   1     2   3    a       2     2
# 2:   1     2   3    a       3     3
# 3:   2     3   5    b       4     4
# 4:   2     6  10    c       7     7
# 5:   3     4  10    d       5     5
# 6:   3    11  17    e      13    13
# 7:   4    NA  NA <NA>      13    13
ADD COMMENT
0
Entering edit mode

thank you very~ much for ur answer!

ADD REPLY
1
Entering edit mode

Please up-vote any answers that have helped.

ADD REPLY

Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6