Extracting related sequences from a FASTA file

0

Entering edit mode

5.4 years ago

ATCG ▴ 380

How can I

Compare long genomic sequences e.g 1-15kb and group them into families
Look for a specific k-mer within these sequences
FInd most frequently shared k-mers

Thank you!

Sequence comparizon Data mining kmer • 1.1k views

ADD COMMENT • link updated 5.2 years ago by Biostar 20 • written 5.4 years ago by ATCG ▴ 380

0

Entering edit mode

You can use cdhit for clustering related sequences (based on sequence identity) . Identify the clusters, identify the sequences for each cluster and iterate motif finding tools on each cluster

ADD REPLY • link 5.2 years ago by cpad0112 21k

0

Entering edit mode

You might consider using mash distances and define a cutoff sequence similarity.

Mash distances inherently use kmer distributions I believe, so you’d go a long way to addressing all these points at once with that approach.

ADD REPLY • link 5.2 years ago by Joe 21k

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 1705 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6