Dear All,
I have written script using python and sed script that will be usefull for removing specific reads from a fastq file based on a reads IDs, This script will works fine and also fast. If there is any problems found in this script or there is any more reliable solution available please share it.
reads_ids.txt
: text file with reads IDs look like this
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
@SRR001667.1 071112_SLXA-EAS1_s_7:5:1:818:346 length=72
@SRR001667.2 071112_SLXA-EAS1_s_7:5:1:802:339 length=72
remove_reads.py
: Python script used to remove reads from a fastq file using reads IDs
import sys
import os
try:
ffastq = sys.argv[1]
fastq = open(ffastq)
except:
print "Usage: python remove_reads.py <reads.fastq> <reads_ids.txt>"
exit()
try:
fids = sys.argv[2]
ids = open(fids)
except:
print "Usage: python remove_reads.py <reads.fastq> <reads_ids.txt>"
exit()
out_file = ffastq + "_filtered.fastq"
new_fid = []
for fid in ids:
fid = fid.rstrip()
new_fid.append('/' + fid + '/,+3d')
cmd = "sed -e '" + str(";".join(new_fid)) + "' " + ffastq + " > " + out_file
os.system(cmd)
fastq.close()
ids.close()
Hi Rajesh, I would recommend to try to replace the following by a python native function.