Finding out if certain SNP positions fall into certain gene regions
1
0
Entering edit mode
5.6 years ago
SGMS ▴ 130

Hi all,

I currently have a certain list of SNPs (chromosome and position). I need to check whether these SNPs fall into another list of gene regions (chr:start_position-end_position) that I have.

I was previously able to find overlaps between positions using findOverlaps in R and I also saw we can do something similar using bedtools intersect. But it seems like those tools are all for finding chr:start-end overlaps, whereas I want to see whether my SNP positions fall into my gene regions of interest.

Any suggestions would be greatly appreciated.

Thank you!

snps gene region R unix • 2.8k views
ADD COMMENT
1
Entering edit mode

it seems like there was a discussion about this: A: How To Intersect A Range With Single Positions

ADD REPLY
0
Entering edit mode

Thanks, my search didn't even fall on that one. It seems I will go with Pierre's suggestion and turn the SNP position into a SNP region having basically the same start and end coordinates. And then do the overlap.

ADD REPLY
1
Entering edit mode

Good description of data. It would help if you can post input data and expected output SGMS

ADD REPLY
0
Entering edit mode

. But it seems like those tools are all for finding chr:start-end overlaps,

I want to see whether my SNP positions fall into my gene regions of interest.

not clear. What is the difference ?

ADD REPLY
0
Entering edit mode

With SNPs, I only have a certain position whereas for the region I have chr:start-end. Do you think findOverlaps would work in this case too?

ADD REPLY
2
Entering edit mode

why don't you convert your positions into 1-base intervals ?

ADD REPLY
0
Entering edit mode

I thought so. You mean for example:

1:169549811-169549811

right?

ADD REPLY
0
Entering edit mode

It would be 1:169549810-169549811, because the BED format is 0-based.

ADD REPLY
0
Entering edit mode

Thanks. If I want to do that in R though, the positions will remain the same..

ADD REPLY
1
Entering edit mode

Yes, the overlap functions from IRanges/GenomicRanges assume 1-based coordinates.

ADD REPLY
1
Entering edit mode
5.6 years ago
$ vcf2bed < snps.vcf > snps.bed
$ gff2bed < genes.gff > genes.bed
$ bedmap --echo --echo-map-id genes.bed snps.bed > answer.bed

The convert2bed binary (vcf2bed and gff2bed) takes care of indexing. You can replace gff2bed with gtf2bed if you have GTF-formatted input.

ADD COMMENT

Login before adding your answer.

Traffic: 2214 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6