Biostar Beta. Not for public use.
Extracting contaminated reads from the sequenced data
0
Entering edit mode
12 months ago

I have the sequenced data of an organism. but it has three 16srRNA which belong to 3 different organisms. I guess it could be contaminated. How could I extract the contigs belonging to each organism present in the sequence data?

ADD COMMENTlink
1
Entering edit mode

Hello,

bbduk might help you. From the web manual:

bbduk.sh in=reads.fq out=unmatched.fq outm=matched.fq ref=phix.fa k=31 hdist=1 stats=stats.txt

This will remove all reads that have a 31-mer match to PhiX (a common Illumina spikein, which is included in /bbmap/resources/), allowing one mismatch. The “outm” stream will catch reads that matched a reference kmers. This allows you to split a set of reads based on the presence of something. “stats” will produce a report of which contaminant sequences were seen, and how many reads had them.

In the ref parameter you can define more than one reference. Have a look at bbduk.sh --help for more options.

fin swimmer

ADD REPLYlink
0
Entering edit mode

Thanks a lot. Let me check and let you know.

ADD REPLYlink
0
Entering edit mode

Hi Finswimmer,

I observed that increasing the k-mer value decreases the number of matched reads. What should be the ideal k-mer size for paired reads of length 150bp. How will the interpretation of results related to matched reads will change with changing k-mer size?

ADD REPLYlink
1
Entering edit mode

Because k= value is used to find the initial match if you set it too high then BBMap tools are not going to find any (or find less initial) matches. So no surprise there. Generally setting k= to something between 20-30 is fine for most applications. Smaller values require more memory.

ADD REPLYlink
0
Entering edit mode

Thank you, this too worked for me.

ADD REPLYlink
2
Entering edit mode
9 weeks ago
genomax 68k
United States

If you truly feel that there are three organisms then you can use bbsplit.sh (from BBMap suite) to bin your reads into respective organismal pools. This will generally work well as long as the bacterial are distinct enough. You are able to decide what you want to do with reads that multi-map (map to all three reference genomes). e.g. keep in all bins, toss etc.

Use the answer here and ask if you have any questions: A: Tool to separate human and mouse ran seq reads

Since you have bacterial data you could turn off maxindel=0.

ADD COMMENTlink
0
Entering edit mode

Yes thank you , I could seperate reads mapping to each reference genome. But still having one doubt. When I use bowtie or Bowtie2, the paired end reads of my data is not getting mapped to the reference genome, even when the 16s sequence of the reference genome is present in my data. Why could that happen?

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1