Prokka and plasmidspades unexpectedly reveal no plasmid
1
0
Entering edit mode
4.4 years ago
claping ▴ 20

I have two similar genomes (one "known", one "unknown"). The "known" genome has already been determined to contain a plasmid. Both Prokka and plasmidspades have unexpectedly shown no results for plasmids in either genomes. In the case of plasmidspades, I ran the following commands:

plasmidspades.py - 9,21,33,55,77,127 --careful -1 read1.fastq -2 read2.fastq
plasmidspades.py - 9,21,33,55,77,127 -1 read1.fastq -2 read2.fastq

In both cases, there was no contigs.fasta in the main output directory, the assembly_graph.fastg in the main output directory was empty, the warnings.log in the main output directory said "No putative plasmid contigs found!", and inside the directories (such as K127), the final_contigs.fasta was empty. Unless I should check elsewhere in the Prokka outputs, I interpret this to mean no plasmids were found.

So, I then downloaded the plasmid sequence fasta file on GenBank (that had already been discovered for the "known" genome). Using BWA, I aligned the raw reads from the genomes to this plasmid reference fasta file. In the "unknown" genome, I see 180 reads mapping to the plasmid reference fasta file. I believe this is evidence that the "unknown" genome contains a plasmid (even if undetected by Prokka and plasmidspades).

*My question is - *

What approach should I take to solidify the evidence for the presence/absence of plasmid(s) in this "unknown" genome? If there is a plasmid present, how can I determine its sequence with confidence? (Side note: For the "unknown" genome, I have the raw reads, the assembled contigs from running SPAdes, and the reordered contigs from running MAUVE against the "known" genome as a reference).

prokka plasmidspades spades plasmid • 2.0k views
ADD COMMENT
1
Entering edit mode

I see 180 reads mapping to the plasmid reference fasta file. I believe this is evidence that the "unknown" genome contains a plasmid (even if undetected by Prokka and plasmidspades).

I would not consider 180 reads solid evidence for presence of plasmid (what kind of data do you have BTW, short or long reads?). How many total reads are in this dataset? Have you checked the alignments to see if they look solid (not many mismatches) and which region of the known plasmid the reads are aligning to?

ADD REPLY
0
Entering edit mode

Total reads are about 480,610. Reads are about 150 base pairs. I do not have much experience at all with this subject and am unsure how to even examine which region of the known plasmid the reads are aligning to. I simply ran code I found online as follows: bwa index plasmid.fasta; bwa mem plasmid.fasta mySample_r1.fastq mySample_r2.fastq > mySample.sam; samtools view -c mySample.sam (total number of reads); samtools view -cF4 mySample.sam (total number of reads in my sample that mapped to plasmid). This is where I am stuck, trying to visualize how these 180 reads align to the plasmid.

ADD REPLY
1
Entering edit mode

You can use IGV to visualize your aligned data. You will need to create a custom genome with your plasmid reference. Follow the directions for custom genomes here. You will need to sort and index your alignment using samtools.

ADD REPLY
0
Entering edit mode

180 map to the plasmid? That's barely nothing (I assume Illumina reads since you used spades), you won't be able to assemble the plasmid with it. How is the genome assembly? Can you compare it to the known genome? If it contains the same plasmid you will be able to find it in the assembly regardless of the assembly method.

ADD REPLY
0
Entering edit mode

I compared the genome assembly to the "known" genome using MAUVE. They were quite similar (only about 20 LCBs at the default settings). In terms of your last sentence, how can I find the plasmid in the assembly? What tools should I use? This may be a simple question but I am new to this type of analysis.

ADD REPLY
0
Entering edit mode

I meant by comparing the assembly with the reference genome, do you see a match between the assembly and the plasmid?

ADD REPLY
0
Entering edit mode
  1. Align with reference genome
  2. Store the unaligned/unmapped reads in a separate file
  3. Look at the stats and see if good chunk of reads remain unaligned
  4. Map the unaligned reads to plasmid sequence

or

  1. Index the plasmid sequence with bowtie2 or bwa
  2. Index the reference genome with bowtie2 or bwa
  3. Use fastqscreen to check for aligned reads with reference and plasmid

    Fastqscreen allows screening against 100000 reads as default from input read files. @ clapin

ADD REPLY
0
Entering edit mode

How was the DNA for the sequencing run prepped?

I agree with the others, 180 reads is not that much (since no doubt at least a few of those reads will be junk) and certainly not enough to get an assembly.

ADD REPLY
0
Entering edit mode
3.5 years ago
appiahv ▴ 20

I hope I am not late. I use ragtag (https://github.com/malonge/RagTag) to extract the plasmid sequence

ADD COMMENT

Login before adding your answer.

Traffic: 1851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6