QIIME filter_fasta.py not removing chimeric sequences
1
0
Entering edit mode
5.9 years ago

Hey everyone,

I am working with 16S data and I have removed my chimeric sequences using vsearch. This program outputted a .txt file with all suspect sequences. I am trying to remove these sequences from my original fasta file using qiime filter_fasta.py command.

This is what I have tried: filter_fasta.py -f <filename>.fasta -o <newfilename>.fasta -s chimeraout.txt -n

But when I grep the original fasta file and the new fasta file using the command below, they have the same number of sequences. The chimeras are not being removed from the original file.

grep "^>" <filename>.fasta | wc -l

I have tried troubleshooting this in the following ways:

First, I noticed that my original fasta had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247 1:N:0:GCGTAGTA+CGTCTAAT

while my chimeric sequence txt file had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247

so I edited the original fasta file to remove the barcode portion of the header. No luck.

Then, I realized that my original fasta file had the sequence outputted to one line, while my .txt file outputted as separate lines as below:

M00307:50:000000000-BT3VT:1:1101:15779:1247 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAAGGATGAAGGTTTTCGGATCGTAAACTTTT GTCTTAGGGGACGAGGAAGGACGGTACCCTAGGAGGAAGCCACGGCTAATTACGTGCCAGCAGCCGCGGTAACACGTAAG CCCCTAGCGTTGTTCGGAATTATTGGGCGTAAAGGGCATGTAGGCGGTCAGGCAAGTCTGGTGTGAAATCTCGTGGCTCA

so I removed the spacing and tried it again but still, nothing was being removed. If I remove the -n parameter from my command the output file is empty so I know that qiime is reading the command properly it appears to not be recognizing the chimeric sequences. Any suggestions on how I can fix this would be greatly appreciated!!

qiime python 16S rRNA chimera • 2.0k views
ADD COMMENT
0
Entering edit mode

Have you had a look at the qiime webpage for the filter_fasta.py command?

http://qiime.org/scripts/filter_fasta.html

It looks as if the file passed with the -s parameter should just have a list of the IDs of the sequences you want to remove, rather than the actual sequences. Try and see if this helps.

ADD REPLY
0
Entering edit mode
5.9 years ago
Tm ★ 1.1k

Like masta|511 said, please make sure that you are using chimeric.txt file as input. Also chimeric txt header file should not start with ">" symbol.

Otherwise, the command which you are using is correct, but again looking at your fasta sequence IDs, I feel you have not run "split_library.fastq.py" command first which converts fastq into fasta and prepares the header in form of "samplename_readnumber"

For instance, if the name of your sample is 'sampleA', then while converting the file from fastq to fasta, it will modify the headers to :

sampleA_1 ATCCCCCC..... sampleA_2 TCCCCAAAA....

I run below mentioned 3 commands first before running "filter_fasta.py" command and everything runs fine:

  1. validate_mapping_file.py
  2. split_libraries_fastq.py
  3. identify_chimeric_seqs.py
ADD COMMENT
0
Entering edit mode

Thank you both for your answers! I wasn't actually running through the QIIME pipeline, I had used vsearch to remove chimeras and this was the way it had outputted the text file. However, removing the chevron worked perfectly - even with the sequences still in the file! I appreciate the help!

ADD REPLY

Login before adding your answer.

Traffic: 2455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6