Demultiplex paired-end reads with dual primers and barcodes
1
0
Entering edit mode
7.9 years ago
kristjan ▴ 170

We prepared an Illumina MiSeq library with double primers - one for bacteria and one for archaea and used the same barcodes with both primers. Now I have the sequenced paired-end reads that I want to split into groups, but every program that I have found is only using the barcodes to split the library (QIIME, cutadapt, etc.). I cannot use them this way as archaea and bacteria samples will be put in the same group. Instead of, barcodes should be used in combination with primers or first the library is split to two groups using the primers and then split to samples using barcodes. Any ideas how to do this?

demultiplex paired-end sequencing dual primers • 4.2k views
ADD COMMENT
0
Entering edit mode

Let me first try to see if I understand:

  1. You used the same index sequences for both of your samples
  2. You want to use the flanking primers to distinguish your samples

Is this correct?

ADD REPLY
1
Entering edit mode

Hello Gabriel, I have a problem as you described. Do you have a feasible approach for this problem?

ADD REPLY
0
Entering edit mode

For the situation that kristjan described where you need to rely on both the indices and flanking primers, no I am afraid I do not have a ready-made solution and you might have to code something that relies on a few existing tools.

ADD REPLY
0
Entering edit mode

I have similar situation but I used different barcodes for bacteria and archaea. But I have issues with split libraries with very low output per sample. Any suggestions? Is there an easier way to modify split libraries command so as to improve sequencing output per sample.

ADD REPLY
0
Entering edit mode

This has been posted as a new question: Combined bacteria and archaea amplicon libraries in one miseq run

Please continue the discussion there.

ADD REPLY
0
Entering edit mode
7.3 years ago

BBMap's Seal tool can split a file based on sequence, assuming the primers are part of the read sequence. For example, let's say bacteria have a primer ACGTYYNATG and archaea have TTTNNGCGCA, and the primers are somewhere in the first 20bp of the reads:

1) Make a fasta file like this:

>bacteria
ACGTYYNATG
>archaea
TTTNNGCGCA

2) Run Seal:

seal.sh in=reads.fq pattern=%.fq ref=primers.fa k=10 mm=f restrictleft=20 copyundefined

That will create bacteria.fq and archaea.fq. You can then do a similar process with the barcodes, if they are inline barcodes. For barcodes in the read header, you can use demuxbyname.sh instead. If you want, you can allow a mismatch with the flag "hdist=1". K should be the length of your primer (if the primers are different lengths, use the length of the shorter primer).

ADD COMMENT
0
Entering edit mode

Hello Brian, what "mm=f" option means? Thx darina

ADD REPLY
0
Entering edit mode

"mm" means "maskmiddle". It essentially allows a single mismatch when doing kmer-matching, as long as that mismatch is in the middle of the kmer. So it increases sensitivity. But if you need exact matches only, set it to false as in this example.

ADD REPLY

Login before adding your answer.

Traffic: 2778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6