Testing Illumina reads for contamination
2
0
Entering edit mode
4.9 years ago
mdgn ▴ 10

Hello everyone,

I am trying to figure out if there is a contamination in my Illumina reads which have bimodal GC content per sequence. I quality filtered to have enough reads with a decent length with optimal quality. Only problem is my bimodal GC content.

enter image description here

Is it normal to suspect a contamination? I am trying to plot the GC content in every contig of my Megahit assembly. I am trying to use python dictionary function to create bins out of it but I really lack the knowledge on how to use it. Can you help me out with this? Thanks everyone.

sequencing genome • 1.9k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

There was an image embeding option so I used it. Is it better now?

ADD REPLY
0
Entering edit mode

You have to enter the link into the field that pops up when pressing the image button. I made the changes now.

ADD REPLY
0
Entering edit mode

Yeah initially I did that. Thank you though.

ADD REPLY
0
Entering edit mode

You should use imgbb.com image hosting provider as noted in help post that was linked above. One you are using does not seem to work with biostars.

ADD REPLY
0
Entering edit mode

I have made the necessary changes for now. mertdogan , the DDoS protection feature on the imggmi website seems to interfere with image rendering without a timeout.

ADD REPLY
4
Entering edit mode
4.9 years ago
seta ★ 1.9k

You can map the reads to reference genome, keep unmapped reads and do blast against nr database to find out any contamination. In my case with plant de novo transciptome assembly, after making assembly and blasting against nr database, I found which contamination I have, so I removed the corresponding reads by bbmap and re-analyzed the clean reads.

ADD COMMENT
1
Entering edit mode
4.9 years ago
SaltedPork ▴ 170

Can't help with Python bins. We have viral reads and filter out any human contamination. This is done by mapping the reads to an indexed file with both human reference genome and a viral database. Any read that maps to human gets discarded. Any read that maps to viral gets kept.

ADD COMMENT
0
Entering edit mode

I understand. I am working with plants and I don't know if I can follow a similar way regarding databases.

ADD REPLY
0
Entering edit mode

Are these DNAseq or RNAseq reads?

ADD REPLY
0
Entering edit mode

I have WGS so DNAseqs.

ADD REPLY

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6