Question

QIIME filter_fasta.py not removing chimeric sequences

0

Entering edit mode

5.9 years ago

samanthabird • 0

Hey everyone,

I am working with 16S data and I have removed my chimeric sequences using vsearch. This program outputted a .txt file with all suspect sequences. I am trying to remove these sequences from my original fasta file using qiime filter_fasta.py command.

This is what I have tried: filter_fasta.py -f <filename>.fasta -o <newfilename>.fasta -s chimeraout.txt -n

But when I grep the original fasta file and the new fasta file using the command below, they have the same number of sequences. The chimeras are not being removed from the original file.

grep "^>" <filename>.fasta | wc -l

I have tried troubleshooting this in the following ways:

First, I noticed that my original fasta had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247 1:N:0:GCGTAGTA+CGTCTAAT

while my chimeric sequence txt file had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247

so I edited the original fasta file to remove the barcode portion of the header. No luck.

Then, I realized that my original fasta file had the sequence outputted to one line, while my .txt file outputted as separate lines as below:

M00307:50:000000000-BT3VT:1:1101:15779:1247 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAAGGATGAAGGTTTTCGGATCGTAAACTTTT GTCTTAGGGGACGAGGAAGGACGGTACCCTAGGAGGAAGCCACGGCTAATTACGTGCCAGCAGCCGCGGTAACACGTAAG CCCCTAGCGTTGTTCGGAATTATTGGGCGTAAAGGGCATGTAGGCGGTCAGGCAAGTCTGGTGTGAAATCTCGTGGCTCA

so I removed the spacing and tried it again but still, nothing was being removed. If I remove the -n parameter from my command the output file is empty so I know that qiime is reading the command properly it appears to not be recognizing the chimeric sequences. Any suggestions on how I can fix this would be greatly appreciated!!

qiime python 16S rRNA chimera • 2.0k views

ADD COMMENT • link updated 5.9 years ago by Tm ★ 1.1k • written 5.9 years ago by samanthabird • 0

0

Entering edit mode

Have you had a look at the qiime webpage for the filter_fasta.py command?

http://qiime.org/scripts/filter_fasta.html

It looks as if the file passed with the -s parameter should just have a list of the IDs of the sequences you want to remove, rather than the actual sequences. Try and see if this helps.

ADD REPLY • link 5.9 years ago by mastal511 ★ 2.1k

score 0 · Answer 1 · 2018-05-19

Like masta|511 said, please make sure that you are using chimeric.txt file as input. Also chimeric txt header file should not start with ">" symbol.

Otherwise, the command which you are using is correct, but again looking at your fasta sequence IDs, I feel you have not run "split_library.fastq.py" command first which converts fastq into fasta and prepares the header in form of "samplename_readnumber"

For instance, if the name of your sample is 'sampleA', then while converting the file from fastq to fasta, it will modify the headers to :

sampleA_1 ATCCCCCC..... sampleA_2 TCCCCAAAA....

I run below mentioned 3 commands first before running "filter_fasta.py" command and everything runs fine:

validate_mapping_file.py
split_libraries_fastq.py
identify_chimeric_seqs.py