I'm currently using integer spans (Set::IntSpan::Fast) as a fast (or the fastest way I can think of) to add start and end co-ordinates for particular genomic co-ordinates to hashes for each gene ID for whole genomes using Perl. This then represents the individual ranges and thus sizes of multiple features (e.g. exons) within the genes.
I also do a similar thing for any repetitive sequence ranges (within the original features), but instead remove this from those original integer spans for each gene (hash), in order to get the unique feature lengths, minus the repetitive sequence (e.g. TEs).
However, this currently creates a bias towards more, smaller gene features, even though the overall feature sequence length is correct. This is because if a sequence is removed from the middle of one of the features, then it will return two smaller segments, even though, what I really want to do, is treat it as one segment still, but use it's smaller length - to build a new frequency distribution of unique sizes!
E.g.
# add three features of 20, 30 & 40bp respectively
$hash{$gene_id}->add_ranges("1-20", "51-80", "111-150")
# remove 3 repetitive elements of 5, 10 and 20bp
$hash{$gene_id}->del_ranges("11-15", "61-70", "121-140")
This results in 6 ranges of 1-10, 16-20, 51-60, 71-80, 111-120, and 141-150. Where as the overall resulting sequence length is correct, the number of features can't increase, but must stay the same (or reduce if the feature is all repetitive). So I need a way to join the fragments back up, relative to the original ranges.
Can you think of the most efficient way I can do this? Perhaps check the integer span boundaries before I remove the sequence and adjust the removal to the boundary of the feature instead (I don't need to know about start or end coordinates after, just the new unique size and frequency of the original features minus the repeat sequence).
Thanks brentp :-) I'll give this a whirl whe I get back in the lab on Monday!
It seems to be truncating the code for some reason, although it may be an issue with mobile Safari?
looks ok to me. last line should be: map { print $->[0] . "-" . $->[1] . "n" } @$del_gene;