align reads to regions with similar sequences

0

Entering edit mode

6.0 years ago

qwzhang0601 ▴ 80

When we do alignment of NGS data (i.e., RNA-seq, ChIP-seq) to the genome, we usually allow certain mismatches for the alignment.

Suppose we allow 2 mismatches (we also accept reads mappable to multiple loci) and a read can match to loci A of the genome with 0 mismatch, match loci B with 1 mismatch and match loci C with 2 mismatches, then what we will expect to get from the aligner (e.g., STAR, bowtie, tophat2)? Only the best matched loci were reported, or all three loci will the reported in the SAM file?

Thanks

alignment • 1.2k views

ADD COMMENT • link updated 3.6 years ago by Biostar 20 • written 6.0 years ago by qwzhang0601 ▴ 80

1

Entering edit mode

If these reads have a good mean quality (above 25-30 phred score based) it may means that these reads correspond to a real repetitive locations, which I think is not a common task for RNA-seq or Chip-seq. However, at least in Bowtie2 and HISAT2 you can decide what to do for multi-hit sequence. Read the manual.

ADD REPLY • link 6.0 years ago by Buffo ★ 2.4k

Login before adding your answer.