blast+ with a degenerate nucleotide
0
1
Entering edit mode
9.3 years ago
kaaschr ▴ 10

I would like to find the places in the genome were my Crispr guide RNA might cause off-target cleavage.

example target: AGGGGTCCTTTCTGAAGTCCAGG It usually bind to sequences that are identical with the 13 bp upstream of the PAM = NNNNNNNCTTTCTGAAGTCCNGG

So I would like to do a search with blast+ (in UNIX) for CTTTCTGAAGTCCNGG in my genome

My fasta file, which I blast

>AGGGGTCCTTTCTGAAGTCCAGG
CTTTCTGAAGTCCNGG

The code for blasting the sequence against my genome

~/software/ncbi-blast-2.2.30+/bin/blastn  -db ~/Cgenomes/Crispr/libs/Cgriseus -query candidate -outfmt 6 -task blastn-short -out temp2 -evalue 100000 -word_size 11

the best results

AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1  100.00  13      0       0       1       13      53760   53748      59   26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1  100.00  13      0       0       1       13      77566   77554      59   26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886135.1  100.00  13      0       0       1       13      478726  478714     59   26.3

So I only get results that are up to 13bp pf alignment = nothing to the right of the degenerate N is used for blast.

From NCBI (http://www.ncbi.nlm.nih.gov/blast/Why.shtml)

Although this alphabet [the degenerate nucleotides] is accepted by BLAST, the BLAST program treats such ambiguities as mismatches in alignment. In short queries, such as primer sequences, these ambiguous bases may prevent BLAST from finding any matches in the database that are as large as the word size.

So should I not be able to get results which are 16bp with 1 mismatch? which parameter would it make sense to tinker with in order for blast+ to find which 16bp combinations are in the genome, which has 1,m2 or 3 mismatches compared to CTTTCTGAAGTCCNGG

alignment blast • 5.3k views
ADD COMMENT
0
Entering edit mode

The search doesn't require a gapped alignment isn't it? You just need a string-comparaison with a degenerate alphabet?

ADD REPLY
0
Entering edit mode

My goal is to find in the 2.5Gbp genome where CTTTCTGAAGTCCNGG can align with 0,1 or 2 mismatches.

But would you not say that blastn is the most user-friendly string-comparison tool out there?

Just to clarify: I am just pretty new to bioinformatics, so the solution is probably very straight forward.

ADD REPLY
0
Entering edit mode

Hey, I have the same question. Were you able to solve this problem?

ADD REPLY

Login before adding your answer.

Traffic: 1248 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6