Biostar Beta. Not for public use.
How to remove reads from fastq flle that match to a set of reads in my fasta file?
1
Entering edit mode
18 months ago
MAPK ♦ 1.4k
United States

I have a fastq file that seems to be contaminated by some sequences contaminating my reagents during library preparation. If I know the reads that came from reagents and I have them in a fasta format, do you think I can eliminate those reads from my fastq file? I want to remove any reads contaminating my fastq file. How can I work this out?

1
Entering edit mode
-Assess and QC Fastq
-Format fastq to fasta
-BLAST to reagent fasta.
-Parse blast results and fasta (from fastq), by removing hits to reagents

0
Entering edit mode

I think you forgot to include a link to the program that does this.

0
Entering edit mode

The OP asked "How can I work this out?". The above comment illustrates a pipeline in which to complete the OP's task, so not one single program.

1. FASTQC and quality trimmer
2. Converter program from FASTQ to FASTA (several exist, e.g. fastxtoolkit)
3. BLAST
4. Several posts on Biostars are available for reference in parsing out sequences from FASTA files based on BLAST results

I was unaware, until your answer @genomax, that the BBMap suite had this option.

0
Entering edit mode

The way you wrote that made it seem like you had copied/pasted that from github description of a package :-)

BTW: @Brian includes a sequencing_artifacts.fa.gz file (in resources directory) that I assume includes contaminants (which may be seen at other places but I assume are seen at JGI).

Various things that BBMap suite can do are here, if you have not seen this post before.

0
Entering edit mode

Trying to avoid black box/turn key solutions, so one can learn in the process.

0
Entering edit mode

Hi did you find the solution for that? If my contamination reads and true reads both are in fastq file then how to remove those reads ?

0
Entering edit mode

I gave you two additional answers in other thread you posted this in: C: Subtracting one FASTAq file Reads from other FASTAq reads

4
Entering edit mode
4 weeks ago
genomax 68k
United States

By using bbduk.sh from BBMap. Provide the contaminants as a multi-fasta file with ref= option.

0
Entering edit mode
18 months ago
Belgium

My NanoLyse script is written for that, using the minimap2 aligner under the hood. It's mainly intended for long reads (Oxford Nanopore/PacBio).