Question

PCR-cDNA sequence from MinION not alligned

0

Entering edit mode

7 weeks ago

marco.barr ▴ 80

Hello everyone, in my initial experiment, I aligned my cDNA sequence from ONT MinION using the following parameters:

./minimap2 -ax map-ont /home/my_reference_PCR.fasta /home/input_pcr_PI4K.fastq > output_PI4K_aligned.sam

Then I built the bam file. I noticed that the alignment focused only on a specific region of the reference with a 10X coverage. To examine other regions, I extracted the reads of maximum length from the fastq file using the following command:

seqkit seq -m $(seqkit stats -T /home/input_pcr_PI4K.fastq | tail -n1 | cut -f 8) /home/input_pcr_PI4K.fastq -o /home/output_PI4K.fastq

However, when aligning this new file using both the -ax map-ont and -x splice commands (since it's a cDNA sequence), I get 0% mapping according to samtools flagstats. I can't figure out why. Is there something wrong with the extraction or do I need to adjust the alignment parameters further? I hope you can help me. Thanks a lot.

minimap2 alignment cDNA • 447 views

ADD COMMENT • link 7 weeks ago by marco.barr ▴ 80

score 0 · Answer 1 · 2024-03-05

0

Entering edit mode

7 weeks ago

Michael 54k

It is possible that the longest reads do not align to the reference properly. Possibly they are more noisy than the rest of the data or from repetitive regions? Possibly you need to play with the options as the alignment may be sensitive to the error rate too. If you want to know what these reads are you could extract the first few as fasta sequence and BLAST or Exonerate them. If you don't get a match then, they are probably from contaminants or symbionts.

ADD COMMENT • link 7 weeks ago by Michael 54k

0

Entering edit mode

I checked and I don't find any matches on BLAST, I'm changing various parameters continuously but I get the same result. What if I considered doing a de novo assembly? Using Canu since I work with long reads?

ADD REPLY • link 7 weeks ago by marco.barr ▴ 80

0

Entering edit mode

Did you check BLASTN vs GenBank (NT) too? It could help to give a little more details about species and samples involved. Which flowcell and basecaller versions were used? Have there been spike-in controls? Have adapters been trimmed? Until now, I was convinced that the sequencers don't simply make up sequences out of thin air. So, yes a de-novo assembly may be something to try.

ADD REPLY • link 7 weeks ago by Michael 54k

0

Entering edit mode

I checked both and no match. The sequences were already clean from the MinKNOW setting but I provide you with flowcell and model:

Flowcell_id:ALJ911_R9.4.1 basecall_model_version_id: 2021_05_17_dna_r9.4.1_minion_384_d37a2ab9

I know I also think it's a contaminant and the idea of de novo assembly came to mind because I don't know what to think. The only thing perhaps is to repeat the PCR.

ADD REPLY • link 7 weeks ago by marco.barr ▴ 80

0

Entering edit mode

That is odd but could be a basecalling artifact. If you don't mind could you post the "ghost" sequences or at least a few examples?

ADD REPLY • link 7 weeks ago by Michael 54k

0

Entering edit mode

Here is the original fastq: https://drive.google.com/file/d/1LO_rw7SvsqQWZPPXQ7S49L-w1vKHKx39/view?usp=drive_link

Here is the fastq extract reads with max lenght: https://drive.google.com/file/d/1QQrxu0vBk_Uf-ac3T1tuKSKm1uejksrC/view?usp=drive_link