Biostar Beta. Not for public use.
Find patterns in DNA sequence
0
Entering edit mode
4.1 years ago
kindlychung • 40
Netherlands

I want te be able to count the number of occurrences in a given sequence (for example ACTTTAG) in the GRCh38 reference genome. Is there an existing tool for doing this? Thanks!

sequence dna • 1.3k views
ADD COMMENTlink
2
Entering edit mode
12 months ago
dago ♦ 2.5k
Germany

You can use Biostring in Bioconductor.

The function countPattern should do the job. Just check if it is using a sliding window or not.

ADD COMMENTlink
0
Entering edit mode
12 months ago
5heikki 8.4k
Finland

Jellyfish is pretty nice for kmer counting.

ADD COMMENTlink
0
Entering edit mode
12 months ago
WCIP | Glasgow | UK

I wrote a simple script for finding patterns (regular expressions in fact) in fasta files, it's fastaRegexFinder.py and I also happen to mention it in this post Quadruplex sequence batch prediction

If you just want to count the number of occurrences you can do

fastaRegexFinder.py -f genome.fa -r 'ACTTTAG' | wc -l
ADD COMMENTlink
0
Entering edit mode
12 months ago
Spain. Universidad de Córdoba

You can also use bowtie1.

It is specially nice in finding (mapping) short sequences..

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1