Question

How to split fastq cointaing shared forward barcodes but different reverse ones?

0

Entering edit mode

5.1 years ago

oscar.nvergara • 0

Hi, I need your help because I'm completely lost with that. I received a paired-end sequencing containing many samples in a forward and reverse paired end fastq set of files as shown below.

Librerires_S4_L001_R1_001.fastq 
Librerires_S4_L001_R2_001.fastq

I was expecting to find a software that could extract the samples in a way similar as shown below, not because is just a personal plan but because is commonly used in some softwares (qiime is an example).

##  [1] "PN1R1_L001_R1_001.fastq"   "PN1R1_L001_R2_001.fastq"  
##  [3] "PN3R2_L001_R1_001.fastq"   "PN3R2_L001_R2_001.fastq"

My problem started when I figured out that some of those samples share the forward barcode, but the difference is in the reverse one, and I´ve never seen something like that. I assume is a feature of modern sequencing platforms with high capacities and with the propper sofware those could be easily splitted and assign to propper derived fastq files.

Sample         Espacer Forward   Espacer Reverse
1   PN1R1             A                B
2   PN3R1             A                C
3   PN1R2             B                C
4   PN3R2             B                D

As you can see, the forward A is contained in two samples, but those doesn´t have the same reverse barcode. As example of the files, show that contain a barcode that can be in the forward and the reverse, an index doing a difference and the forward and reverse primer.

  SAMPLE              BARCODE        INDEX   SPECIFIC PRIMER
  For_A  FORWARD    CCTAAACTACGG            CCTACGGGNGGCWGCAG
  For_B  FORWARD    TGCAGATCCAAC      T     CCTACGGGNGGCWGCAG
  Rev_B  REVERSE    TGCAGATCCAAC      A     GACTACHVGGGTATCTAATCC 
  Rev_C  REVERSE    CCATCACATAGG      TC    GACTACHVGGGTATCTAATCC 
  Rev_D  REVERSE    GTGGTATGGGAG      CTA   GACTACHVGGGTATCTAATCC

What I need is to find a sofware that could pick the samples acording to their respective barcodes, even if those are shared in some side and separate between samples. I've been trying some softwares (qiime1, fastx, mothur) but nothing worked as expected. Also I wanted to check qiime2 and SeekDeep too, but at this point I don´t want to waste time checking each software without having a real idea of what they can do.

Does somebody know that kind of post processing and give me a tip of a program which does that kind of job? I would be totally grateful for any hint.

Sorry for this large post but I just wanted to give as much details as I could. Thanks for your time

sequencing next-gen software error • 2.2k views

ADD COMMENT • link 5.1 years ago by oscar.nvergara • 0

0

Entering edit mode

Demultiplexing is easiest to do at the point of making the fastqs, so why wasn't it done then?

ADD REPLY • link 5.1 years ago by swbarnes2 14k

0

Entering edit mode

Sadly I don't know. I received that job because I used to analize data but already demultiplexed, or at least without mixed barcordes, as you say in the sequencing center they just didn't do it, and Im trying to deal with that. I asked a friend of mine who also works a lot with Illumina paired end sequencing and he asked the same.

ADD REPLY • link 5.1 years ago by oscar.nvergara • 0

0

Entering edit mode

I post this as a comment because I am not 100% sure. But maybe you can use this https://cutadapt.readthedocs.io/en/stable/guide.html#demultiplexing

And then make some kind of loop where you do demultiplexing bases on only the markers of one sample. And after that you demultiplex again on the files then where not able to "demultiplex" --untrimmed-paired-output. And continue that till all samples are done.

ADD REPLY • link 5.1 years ago by gb ★ 2.2k