Question

Demultiplex paired-end reads with dual primers and barcodes

0

Entering edit mode

7.9 years ago

kristjan ▴ 170

We prepared an Illumina MiSeq library with double primers - one for bacteria and one for archaea and used the same barcodes with both primers. Now I have the sequenced paired-end reads that I want to split into groups, but every program that I have found is only using the barcodes to split the library (QIIME, cutadapt, etc.). I cannot use them this way as archaea and bacteria samples will be put in the same group. Instead of, barcodes should be used in combination with primers or first the library is split to two groups using the primers and then split to samples using barcodes. Any ideas how to do this?

demultiplex paired-end sequencing dual primers • 4.2k views

ADD COMMENT • link updated 7.3 years ago by Brian Bushnell 20k • written 7.9 years ago by kristjan ▴ 170

0

Entering edit mode

Let me first try to see if I understand:

You used the same index sequences for both of your samples
You want to use the flanking primers to distinguish your samples

Is this correct?

ADD REPLY • link 7.9 years ago by Gabriel R. ★ 2.9k

1

Entering edit mode

Hello Gabriel, I have a problem as you described. Do you have a feasible approach for this problem?

ADD REPLY • link 7.8 years ago by Obi ▴ 10

0

Entering edit mode

For the situation that kristjan described where you need to rely on both the indices and flanking primers, no I am afraid I do not have a ready-made solution and you might have to code something that relies on a few existing tools.

ADD REPLY • link 7.8 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

I have similar situation but I used different barcodes for bacteria and archaea. But I have issues with split libraries with very low output per sample. Any suggestions? Is there an easier way to modify split libraries command so as to improve sequencing output per sample.

ADD REPLY • link 7.3 years ago by dpitta • 0

0

Entering edit mode

This has been posted as a new question: Combined bacteria and archaea amplicon libraries in one miseq run

Please continue the discussion there.

ADD REPLY • link 7.3 years ago by GenoMax 141k

score 0 · Answer 1 · 2016-12-23

0

Entering edit mode

7.3 years ago

Brian Bushnell 20k

BBMap's Seal tool can split a file based on sequence, assuming the primers are part of the read sequence. For example, let's say bacteria have a primer ACGTYYNATG and archaea have TTTNNGCGCA, and the primers are somewhere in the first 20bp of the reads:

1) Make a fasta file like this:

>bacteria
ACGTYYNATG
>archaea
TTTNNGCGCA

2) Run Seal:

seal.sh in=reads.fq pattern=%.fq ref=primers.fa k=10 mm=f restrictleft=20 copyundefined

That will create bacteria.fq and archaea.fq. You can then do a similar process with the barcodes, if they are inline barcodes. For barcodes in the read header, you can use demuxbyname.sh instead. If you want, you can allow a mismatch with the flag "hdist=1". K should be the length of your primer (if the primers are different lengths, use the length of the shorter primer).

ADD COMMENT • link 7.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Hello Brian, what "mm=f" option means? Thx darina

ADD REPLY • link 6.6 years ago by darina.cejkova • 0

0

Entering edit mode

"mm" means "maskmiddle". It essentially allows a single mismatch when doing kmer-matching, as long as that mismatch is in the middle of the kmer. So it increases sensitivity. But if you need exact matches only, set it to false as in this example.

ADD REPLY • link 6.6 years ago by Brian Bushnell 20k