Hi all friends,
I want to discover simple sequence marker (SSR) for human samples (healthy and disease group) from mRNA sequencing data by Hiseq. So, I mapped reads to the genome and extracted the consensus sequences for finding SSR on them, however, the consensus sequences were enriched with "N", means no read map to those genome regions. Considering the type of a given SSR motif relies on the sequence context and the neighbor nucleotides, I think removing N from the consensus sequences is not right. Could you please let me know your idea about it? Also, please kindly tell me if there is any tool for finding SSR from BAM/SAM file?