Filtering only high confidence Human-Chimp alignments
5.2 years ago
@t.c.a.smith10021

Looking at Human-Chimp alignments downloaded from UCSC (goldenPath/hg19/vsPanTro4/) it appears that there are many very short alignments (extract from chromosome 1 below), alignments that are so short I am not sure how they can be considered alignments with any confidence (see alignment 296 below). Each alignment comes with a BLASTZ score in the header row. Indeed alignemnt 296 has a score of 182, which from what I can figure from the associated README file, only alignment scores of over 4500 were kept and thus a score of 182 should be ignored.

My main question is, when using these alignments to investigate divergence, we need to retain only alignments of high confidence; what would be considered a good score to filter out low confidence alignments? is 4500 sufficient or is a higher score more commonly used by anyone? If anyone has experience with these alignments, you r insights would be appreciated.

    294 chr1 757277 757286 chr7 159261555 159261564 + 964
CTACCTGCCT
CTACCTGCCT

295 chr1 757287 757712 chr7 159264222 159264647 + 36557
CTCCAGAAGATCCACCCTGTCTATACTACCTGCCTATCCAGCATATCTACCCTGTCTACACTAC
TTCCAGCAGATCCACCCTGTCTATACTACCTGCCTAACCAGCATATCTACCCTGTCTACACTAC

296 chr1 757713 757714 chr7 159265592 159265593 + 182
TA
TA

