removal of spesific read from fastq file
1
3
Entering edit mode
6.0 years ago

Hellow, Can someone kindly let me know that how to remove a specific read from paired end fastq file using awk or any other command...???

next-gen • 5.2k views
ADD COMMENT
4
Entering edit mode
6.0 years ago

some example data would help. If you know the read by id, then try (seqkit is available here and you can write output to fastq)

$ seqkit grep -r -p <read_id> -v input.fastq

example:

$ seqkit grep -v -rp 'K00193:38:H3MYFBBXX:4:2119:24527:21657/1' hcc1395_normal_rep1_r1.fastq.gz
ADD COMMENT
0
Entering edit mode

Thank you so much for kind support. im newbie to NGS and linux and interested to know that Is it possible to remove specific reads using linux commands only rather to use ant toolkit. thank

ADD REPLY
1
Entering edit mode

Assuming that you have fastq gzipped,

$ zgrep "@" input.fastq.gz | grep -v  "<readname to be excluded>" | while read line; do zgrep -A 3  $line input.fastq.gz ; done
  1. First argument zgreps @ in each line. This is to print all the headers.
  2. Second argument searches all the headers that doesn't match the provided read name
  3. Here, zgrep can be used. However, zgrep seems to have some limitations. Hence a while loop. If you do not like loop, you can use parallel (GNU-parallel available in most of the distros)

Please direct the output to a file of your choice

$ zgrep "@" input.fastq.gz | grep -v  "<readname to be excluded>"  | parallel zgrep -A 3 {} input.fastq.gz

If you have fastq unzipped, try this:

$ sed  -n '/@/!d; /< read name>/!p' test.fastq | grep -A 3 -f -  test.fastq

(note: if sample read id contains strand information (/1 or /2), make sure that they are escaped. For eg. if read id is K00193:38:H3MYFBBXX:4:2119:24527:21657/1, sed command would be:

$ sed  -n '/@/!d; /K00193:38:H3MYFBBXX:4:2119:24527:21657\/1/!p' test.fastq | grep -A 3 -f -  test.fastq
ADD REPLY
0
Entering edit mode

thank you so much for kind help.. I will try it and will discuse the output,.. many thanks agauin..

ADD REPLY
0
Entering edit mode

what if you have the specific sequence but not a sequence read? (for example, a repeated sequence that is overrepresented and you wanna remove from your sample)

ADD REPLY

Login before adding your answer.

Traffic: 2695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6