Identify informative markers from two in silico digests?
1
0
Entering edit mode
9.2 years ago
J.R. • 0

Hi all,

I'm trying to analyze fragments generated from in silico digests of two draft genomes to determine how many of the fragments differ between the two species. For input, I have two unordered lists of ~5,000 ~300bp fragments (one from each species); I would like to determine how many of them are identical, how many differ by 1 base, 2, 3, etc., and how many don't have a corresponding fragment in the other genome.

Is this possible? What approach should I take? I can't seem to figure out if pairwise aligners will handle unordered lists like this, or if multiple alignment is what I need, or if I just want to map both sets back to one of the reference genomes. I'm not very experienced at this but I'd like to learn.

Thanks,

Joanna

RAD-seq in silico alignment SNP DNA • 1.9k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

I would try clustering them as a first quick shot, so I would first make sure that the header names are distinct in these two groups, then used for example CD-HIT (http://weizhongli-lab.org/cd-hit/) and parsed the clusters I would get (proportion of sequences in group 1 versus group 2 in each cluster). Also, sizes of the clusters would be interesting to look at. (I guess that if clustering worked well, I would sort clusters by length and then plot proportions as very narrow barplots with two colors representing groups. But that's very personal suggestion :D)

ADD COMMENT
0
Entering edit mode

Thank you! That ended up working really well, and the output was super-easy to parse. Thanks so much!

ADD REPLY
0
Entering edit mode

Glad to here that! Please remember to upvote my (and any other) answers that you find helpful ;-)

ADD REPLY
0
Entering edit mode

Thanks! Always good to learn the etiquette of new online communities :D

ADD REPLY

Login before adding your answer.

Traffic: 1656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6