Biostar Beta. Not for public use.
Question: search multiple sequences in multiple fastqs
0
Entering edit mode

Hi all!

I have a script which goes through a bunch of fastq files and searches for specific sequences found in a text file called test_text.txt, then outputs a fastq file with only those reads. The reads in my fastq files are all between 15-30bp long, however when I look at the output fastq files, I am getting reads of 52 bp long.

Does anyone know why this would happen? code is below:

for fastq in `ls my_files`;do
    for i in `cat test_text.txt`; do zgrep -B 1 -A 2 $i $fastq >>filtered_fastq/${fastq}.filtered_reads.fastq; done;
done

Thanks for any input.

ADD COMMENTlink 11 months ago max_19 • 100
Entering edit mode
0

What have you done so far to convince yourself/check the reads are all 15-30bp?

I can't see an obvious mistake at first glance, so are you sure there are no 52 bp reads in the input file?

ADD REPLYlink 11 months ago
Joe
12k
Entering edit mode
0

I used cutadapt to set min/max length for all sequences (cutadapt -a $adapter3 -f fastq -m 15 -M 30). also when I do fastqc on the file, my sequence length in the basic stats section shows '15-30'

ADD REPLYlink 11 months ago
max_19
• 100
Entering edit mode
0

Make sure; do awk '{print length}' <filename> | sort -n | tail . The last number should be the longest sequence length in your file. Remove the headers before if you think they're longer than 30bp (i.e. grep -v ">" <filename> | awk '{print length}' | sort -n | tail

ADD REPLYlink 11 months ago
manuel.belmadani
• 830
Entering edit mode
0

Thank you! Tried this and got 54. This is odd! does this mean something is wrong with my cutadapt step? and why does FASTQC report 15-30bp for sequence length

ADD REPLYlink 11 months ago
max_19
• 100
Entering edit mode
0

cutadapt should get rid of reads > 30 in length in that case. Are you sure that you have the trimmed file in my_files, and not the original file you had before running cutadapt?

ADD REPLYlink 11 months ago
manuel.belmadani
• 830
Entering edit mode
0

By the way, the -f switch in grep allows you to provide a file of patterns to search; e.g. grep -f <file_with_patterns> <file_to_search_in>, so possibly your could get rid of your inner for loop.

ADD REPLYlink 11 months ago
manuel.belmadani
• 830
Entering edit mode
0

Good to know, thank you!

ADD REPLYlink 11 months ago
max_19
• 100

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0