Finding the barcodes (3' adapters) of multiplexed samples
1
0
Entering edit mode
4.5 years ago

Hello, I'm fairly new to bioinformatics but I would like to ask you a question about identification of barcodes. I have a file which contains the results of a multiplex sequencing. I have to demultiplex a dataset without having the barcodes.

The question is how to identify the barcodes (3' adapters) which were used, to further identify the number of sequences that were sequenced from each sample? What algorithms can I use to solve this problem?

barcode demultiplex • 1.0k views
ADD COMMENT
0
Entering edit mode

What kind of sequencing is this? Illumina/pacbio? In case of standard illumina multiplexing, adapter sequences are read independently and never part of main reads. You may find deML useful if you have custom indexes.

ADD REPLY
0
Entering edit mode

Thank you for your response. This is a very simplified example, because I only have multiplexed sequences in the file. Their format looks something like this:

{sequence} {barcode} {3 'adapter},

where the 3' adapter is the same for all sequences in this file, they only differ in barcode. My task is to find all the barcodes that appear in the sequences.

ADD REPLY
0
Entering edit mode
4.5 years ago

So you aren't using typical Illumina indices?

If your barcode really is embedded in the beginning of the read, you can use umi_tools to extract it. The people generating the fastqs for you also could have extracted it as part of bcl2fastq

ADD COMMENT
0
Entering edit mode

Yes exactly. What's more, I wanted to solve this problem from scratch using Python, that's why at the beginning I asked about algorithms that could be used to solve this problem.

ADD REPLY

Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6