How To Extract Set Of Reads From Fastq (Or Eventually Fasta And Qual) Based On List Of Ids?
2
8
Entering edit mode
11.9 years ago

I have a list of ids and two files (male and female). I want to find sequences and quality scores for all ids in my list.

I have:

list of ids

2 fastq files (eventually 2 fasta files and 2 qual files)

and I want to get:

extracted_sequences_based_on_ids.fastq (eventually extracted_sequences_based_on_ids.fasta and extracted_sequences_based_on_ids.qual) (can be two files that I merge afterwards)

Yes, I can write the script that will be for each id looking up the sequence and quality score and writing the result to a new file, but I would not like to reinvent the wheel in case this is something easy to perform with already existing tools:) (and might be handy for other biostars users as well)

Thanks a lot.

extraction read 454 fastq fasta • 37k views
ADD COMMENT
14
Entering edit mode
11.9 years ago

I think Heng Li's seqtk will do what you need: https://github.com/lh3/seqtk

Extract sequences with names in file name.lst, one sequence name per line:

seqtk subseq in.fq name.lst > out.fq
ADD COMMENT
1
Entering edit mode

Hi, I try to extract sequences from a fastq file using the "subseq" in seqtk. But the extract file contains only the 1st sequence but no others. I am wondering whether my name.lst file does not fit with what seqtk needs. I have names of each sequence without other symbols each line in the name.lst. But the fastq file starts each sequence name with a @. Should I add @ in front of each sequence name? Or what other problem it can be?

Any suggestion is welcome. Thanks,

Chih-Ming

ADD REPLY
0
Entering edit mode

Thank you very much! It worked.

ADD REPLY
4
Entering edit mode
11.9 years ago

This can be done with Biopieces and is in fact covered in the Howto.

ADD COMMENT
0
Entering edit mode

Thanks a lot, I didnt know Biopieces (looks very interesting though), but I unfortunately wasnt able to install it because of some problem with Perl modul Inline.

ADD REPLY
0
Entering edit mode

Try the Biopieces Google Group for help.

ADD REPLY

Login before adding your answer.

Traffic: 2941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6