Question

Dual demultiplexing on illumina sequences

0

Entering edit mode

6.4 years ago

U ▴ 70

Is it possible to add additional sample ID's to the reads in a fastq file whilst demultiplexing.

I have pooled sequences from 95 wells for each plate and primer sequences determines the well ID. So currrently, after demultiplexing, I have a script that takes in the input fastq file, reads the first 30 basepairs whilst looking for "NN" and then for "Y" and converts the primer sequences to degenerate bases to get the right primer set. This primer set then helps assign well id. However to make the process workflow simpler, I would like this to happen right at the demultiplexing stage. Any insight will be most helpful.

i.e. From read in fastq file

@M04012:86:000000000-BCB57:1:1101:17394:1866 CACGGTTGACTCAGCCCTTGACCAGGCACCTCGAATTCCACAGGGC

converts to

>C04 12:86:000000000-BCB57:1:1101:17394:1866 CACGGTTGACTCAGCCCTTGACCAGGCACCTCGAATTCCACAGGGC

Here C04 is my well ID. I have a primerset Sequence file given by Name, type, chain, index and sequence. So, CO4 id is like so

Col_VK_C04,Col,VK,C04,NNTCTGTCATGAYATTGTG,,,,,

Illumina Ig NGS demultiplexing • 1.7k views

ADD COMMENT • link updated 6.4 years ago by h.mon 35k • written 6.4 years ago by U ▴ 70

0

Entering edit mode

I think you should QC your fastq reads, and then merge them. Convert the merged file to fasta, and then look for left primers based on well position, after which you can trim and translate to your V-region sequences.

ADD REPLY • link 6.4 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

I edited your post because it seemed to me you want to convert from fastq (@M04012:86:000000000-BCB57:1:1101:17394:1866) to fasta (>C04 12:86:000000000-BCB57:1:1101:17394:1866), is that right?

ADD REPLY • link 6.4 years ago by h.mon 35k

0

Entering edit mode

I have a feeling OP wants to add the sample name in the fastq header (original post is worded badly so hard to be sure). Sounds like something needed for Qiime like pipeline.

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

The well location can be added regardless. I think however, it would be easier to demultiplex post merging read pairs, but I don't know the OP's downstream process or goal. Judging by the information above, this sounds similar to something I've done in the past and have written a python demultiplex script to assign sequences to wells.

ADD REPLY • link 6.4 years ago by st.ph.n ★ 2.7k