Doubt regarding Reference genome alignment
1
0
Entering edit mode
5.8 years ago
bic • 0

Hello all,

We are working with whole genome sequence data of a Pseudomonas fluorescence strain. The de-novo assembly of the same was performed using abyss software. After that, contigs from denovo assembly was submitted in RAST server for annotation. From RAST, the closely related species to this strain was identified. When we did the alignment between our strain and the related strain using Bowtie2, it shows 71% overall alignment rate. So I want to know whether this alignment rate is good or not.

Just feel that the alignment should be a bit more between two strains of the same species. Not sure if this question is a blunder as we are new to NGS data analysis.

Also Could someone suggest any tool to find out a reference genome other than by BLAST?

Thanks in advance

Regards

Ravisankar

next-gen alignment assembly Bowtie2 • 1.3k views
ADD COMMENT
2
Entering edit mode
5.8 years ago
pbpanigrahi ▴ 420

I want to know whether this alignment rate is good or not.

It depends on how much the two strain differ. If the two strains are indeed different, then you expect low alignment rate. This can be checked by allowing 1 mismatch during seed alignment step, whether it improves alignment rate.

From the manual

-N <int> Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

After giving -N 1, if alignment rate increases significantly, then you can infer that that the two strains are different at many single nucleotide positions.

ADD COMMENT
0
Entering edit mode

Thanks for the quick reply. Will try it soon.

Also Can someone help me with the second question? Tool to find out reference genome other than BLAST?

ADD REPLY
0
Entering edit mode

I repeated the alignment using -N 1 option and it now gives 77% overall alignment rate. So can we infer r from this result that the two strains are quite different from each other?

Thanks

ADD REPLY
0
Entering edit mode

To check, you can obtain the 30% reads which don't map and try to do blast and see is there any possibility of contamination? You can also explore Fastq_Screen and DeconSeq for contamination detection. UCSC blat is alternate option for BLAST.

Other thing to try is using other aligners like bwa-mem and see whether using different aligner improves.

Also any multi hits you getting? i.e one read mapping to multiple location? Can you post the alignment statistics what you getting.

If everything seems fine, then you may assemble your reference genome and then realign with that to see how much % of reads aligned and them compare the assembled reference genome with the genome you are comparing and check whether the 2 are indeed different or not?

ADD REPLY

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6