Remove Reads from fastq file based on read IDs
2
2
Entering edit mode
8.0 years ago

Dear All,

I have written script using python and sed script that will be usefull for removing specific reads from a fastq file based on a reads IDs, This script will works fine and also fast. If there is any problems found in this script or there is any more reliable solution available please share it.

reads_ids.txt: text file with reads IDs look like this

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
@SRR001667.1 071112_SLXA-EAS1_s_7:5:1:818:346 length=72
@SRR001667.2 071112_SLXA-EAS1_s_7:5:1:802:339 length=72

remove_reads.py: Python script used to remove reads from a fastq file using reads IDs

import sys
import os

try:
    ffastq = sys.argv[1]
    fastq  = open(ffastq)
except:
    print "Usage: python remove_reads.py <reads.fastq> <reads_ids.txt>"
    exit()


try:
    fids = sys.argv[2]
    ids  = open(fids)
except:
    print "Usage: python remove_reads.py <reads.fastq> <reads_ids.txt>"
    exit()

out_file = ffastq + "_filtered.fastq"

new_fid = []
for fid in ids:
    fid = fid.rstrip()
    new_fid.append('/' + fid + '/,+3d')

cmd = "sed -e '" + str(";".join(new_fid)) + "' " + ffastq + " > " + out_file

os.system(cmd)

fastq.close()
ids.close()
fastq • 3.6k views
ADD COMMENT
1
Entering edit mode

Hi Rajesh, I would recommend to try to replace the following by a python native function.

cmd = "sed -e '" + str(";".join(new_fid)) + "' " + ffastq + " > " + out_file
ADD REPLY
2
Entering edit mode
8.0 years ago
gunzip -c fastq.gz | paste - - - - | grep -v -f reads_ids.txt | tr "\t" "\n" | gzip --best > out.fastq.gz
ADD COMMENT
0
Entering edit mode

opsss, changed last gunzip to gzip....

ADD REPLY
0
Entering edit mode
8.0 years ago

There is a really nice tool named "fasts-grep" in the same lines as Pierre's solution.

http://homes.cs.washington.edu/~dcjones/fastq-tools/fastq-grep.html

Download link: http://homes.cs.washington.edu/~dcjones/fastq-tools/

ADD COMMENT

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6