find genes name of a specific location
1
0
Entering edit mode
6.7 years ago
hasani.iut6 ▴ 60

Hi. I'm trying to find the hits of an special pattern in human genome and then I want to know that the hits that has been find is belong to any special genes or not?

For doing that, I want use Bsgenome package for finding hits and Txdb package as annotation package for considering the genes.

how can I do that?

tnx all.

vmatchpattern genes • 3.2k views
ADD COMMENT
0
Entering edit mode

What have you tried so far? Any error messages?

ADD REPLY
0
Entering edit mode

actually I don't know what should I do. I think I will have to granges variable that I can find overlaps of them.

ADD REPLY
0
Entering edit mode

Hi. I'm trying to find the hits of an special pattern in human genome and then I want to know that the hits that has been find is belong to any special genes or not?

It's not clear to me what you are doing, which data you have and what you exactly aim to achieve. Please elaborate.

ADD REPLY
0
Entering edit mode

I want to find the hits of the 'ACGGTAACGTACGTAGTCAT' in human genome using vmatchPattern function of bioString package. Suppose that one of the hits has been happened in chr5:124323-124344. I want to know that this hit is belong to any gene or not?

ADD REPLY
0
Entering edit mode

non-R solution:

 seqkit locate -i -p ccttctctgggccttgatttcccctcctgc ../reference/chr12/chr12.fa --bed | bedtools intersect -a - -b ../reference/chr12/genes_chr12.gtf -wb

Note: Script above looks for sequence "ccttctctgggccttgatttcccctcctgc" in chr 12 sequence (in fasta format) and intersects with gene list (in gtf format, downloaded from UCSC) on chr12. Seqkit output is in bed format.

output:

chr12   186551  186581  ccttctctgggccttgatttcccctcctgc  0   +   chr12   unknown exon    186542  186878  .   +   .   gene_id "IQSEC3"; gene_name "IQSEC3"; p_id "P13619"; transcript_id "NM_015232"; tss_id "TSS12565";

Download seqkit from http://bioinf.shenwei.me/seqkit/download/. Bedtools can be installed from synaptic/apt repositories in ubuntu.

ADD REPLY
2
Entering edit mode
6.7 years ago
James Ashmore ★ 3.4k

Here is an example using the mouse genome and a random pattern:

# Load relevant packages
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)

# Get occurence of motif in genome
motifRanges <- vmatchPattern("ACGGTAACGT", BSgenome.Mmusculus.UCSC.mm10)

# Get gene ranges
geneRanges <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)

# Annotate gene ranges with symbol
geneRanges$symbol <- mapIds(org.Mm.eg.db,
                            keys = geneRanges$gene_id,
                            column = "SYMBOL",
                            keytype = "ENTREZID",
                            multiVals = "first")

# Find overlaps between motif and gene ranges
hitsObject <- findOverlaps(motifRanges, geneRanges, type = "within")

# Extract the indexes of overlapping pattern and gene
motifHits <- queryHits(hitsObject)
geneHits <- subjectHits(hitsObject)

# Annotate motifs with overlapping gene_id and symbol
motifRanges$gene_id <- NA
motifRanges$symbol <- NA
motifRanges$gene_id[motifHits] <- geneRanges$gene_id[geneHits]
motifRanges$symbol[motifHits] <- geneRanges$symbol[geneHits]
ADD COMMENT

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6