How to identify every read in a fastq file?
4
1
Entering edit mode
7.6 years ago
gerberd1990 ▴ 30

Hi, Everybody is looking for their target reads in a fastq file, and I am just sitting here and can not find a good program to identify the remaining (junk) reads. I am working on ancient DNA (currently horse) illumina reads, and I want to identify the exact organisms (possibly pathogens, human or other contamination, etc) of the remaining reads besides the horse sequences (approx 20-30% of the data contains horse DNA actually). So, can anyone recommend a good program for this task? Thanks in advance :)

genome next-gen blast • 2.8k views
ADD COMMENT
0
Entering edit mode

WOW, thanks for everybody, I see many valuable information here :)

ADD REPLY
3
Entering edit mode
7.6 years ago
Medhat 9.7k

First you need to align your reads to the expected reference that it may be contaminated with using FastQ Screen Or DeconSeq, then below post to remove it

http://seqanswers.com/forums/showpost.php?p=109308&postcount=6

ADD COMMENT
1
Entering edit mode
7.6 years ago

This would seem like a good use case for something like Kraken: https://ccb.jhu.edu/software/kraken/

But your ability to assign every read in your experiment to an organism of origin will depend entirely on the completeness of your database.

ADD COMMENT
1
Entering edit mode
ADD COMMENT
1
Entering edit mode

I think it is specialized in:

discover the source of all reads, which originate from complex RNA molecules, recombinant antibodies and microbial communities.

ADD REPLY
0
Entering edit mode

Yes, it is not meant for this application but might give some ideas.

ADD REPLY
0
Entering edit mode
7.6 years ago
igor 13k

There are some suggestions in these previous threads:

ADD COMMENT

Login before adding your answer.

Traffic: 2367 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6