Question

How to split BAM files by samples from PacBio

0

Entering edit mode

5.4 years ago

misterie ▴ 110

Hi,

I have sequences from 6 bacterias (E. Coli and Salmonella). 6 samples have been sequenced using PacBio on 4 movies so I have 3 bax files for every movie (12 in total).

I converted bax file to bam using Bax2Bam and then I used Lima to do demultplexing (using a set of about 330 barcodes in fasta). The output of Lima = 4 bam files (= number of movie).

The question is, how can I filter/split that file by sample? I would like to get 6 file - each for every sample.

Thank you in advance.

pacbio bam bax • 3.6k views

ADD COMMENT • link updated 5.4 years ago by jharting • 0 • written 5.4 years ago by misterie ▴ 110

0

Entering edit mode

Demultiplexing is usually splitting by sample. Were the right barcodes used?

ADD REPLY • link 5.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes. In my fasta with barcodes there are about 300 barcodes...

ADD REPLY • link 5.4 years ago by misterie ▴ 110

0

Entering edit mode

What is the exact lima command line you used to split the barcodes file? did you use the --split-bam parameter?

ADD REPLY • link 5.4 years ago by gconcepcion ▴ 410

0

Entering edit mode

Yes, I tried with that function and lima created about 900 bam files (with barcode prefix) and I do not know how can I identified my samples (because of plenty BAM files).

ADD REPLY • link 5.4 years ago by misterie ▴ 110

score 0 · Answer 1 · 2018-11-15

0

Entering edit mode

5.4 years ago

jharting • 0

Try using the split-bam-named option to label the outputs by the headers from the barcode set. You can also use the option --peek-guess to filter out undesirable barcode pairs. This should reduce the number of output files. Look in the file "lima.guess" for information on which barcode pairs were inferred from your inputs.

ADD COMMENT • link 5.4 years ago by jharting • 0

0

Entering edit mode

Ok, Thank you, Now I understand. I used that parameters for lima, but I do not understand sth. I used 3 barcodes for 1 run (on 1 rune we have 3 samples sequenced), but lima produced 6 bams – every combination of my barcodes (025 forward with 025 forward; 025 forward with 0032 forward; 0032 forward with 0032 forward etc.). Could u tell me why I have 6 files instead of 3? How can I identify my samples? Thank you in advance.

ADD REPLY • link 5.4 years ago by misterie ▴ 110

0

Entering edit mode

You should also be using the option --same, assuming your barcodes are the same on both ends of the insert (which is the only possibility when adding barcodes to sheared libraries). This option will filter out any read that has different barcodes on either end of the insert. There are various reasons why you might see asymmetric/different barcodes on inserts even in a library prepped with the same barcode on both ends -- read error, small levels of contamination -- but the counts of the asymmetric/different reads should be much lower relative to the symmetric/expected read counts.

ADD REPLY • link 5.4 years ago by jharting • 0