As a part of plant genotyping research and CRISPR cas9 mutation screening methods, I need to sequence and align many reads in order to capture low-frequency varients. therefore, im using NGS (rather then Sanger)
Im trying to align reads from a fastq file (pair-end) to a short amplicon/reference which is about 200 bp.
I tried using the Bowtie2 aligner. first, used the Build option pretty generically: bowtie2-build ref.fa ref
then do the alignment: bowtie2 -x ref -1 R1.fastq -2 R2.fastq -X 200 --fr -S output.sam, which executed with no issues, though not one read was aligned...
Can anyone please explain what's going on? or even better, how to perform this kind of analysis.
It is possible, and even likely depending on the library preparation, that many fragments (read pairs) span more than 200 nt and are unlikely to map concordantly on the 200nt reference. Instead, a more common strategy would be to map on the full genome, then extract the genomic region of interest (the one targeted for mutagenesis). This strategy would also be more robust against spurious mapping.
Before going forward however, I would double check the quality of the reads (use FASTQC for instance) and double check the reference (bowtie2-inspect on the index would be a good start).