Converting BAM to Fastq with HTSlib htscmd
2
0
Entering edit mode
9.6 years ago
trakhtenberg ▴ 160

I found 2 posts recommending to use HTSlib htscmd for converting BAM to Fastq, but both output only interleaved file paired reads. I used this tool to deinterleave.

After deinterleaving the paired.fq from this command: htscmd bamshuf -Ou input.bam tmp-prefix | htscmd bam2fq -s se.fq.gz - | gzip > pe.fq.gz (Source) I got only 41k pairs.

After deinterleaving output of this command htscmd bamshuf -uOn 128 aln_reads.bam tmp | htscmd bam2fq -a - | gzip > interleaved_reads.fq.gz (Source), I got 232.7k reads for one end and 214.4k reads for the other end.

Is there a parameter in HTSlib htscmd which could instruct to output dedeinterleaved reads? Or is there a more fitting tool for deinterleaving?

My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?

Fastq Bam2Fastq Tophat BAM HTSlib • 4.9k views
ADD COMMENT
0
Entering edit mode
9.6 years ago
aniketd86 ▴ 150

Q] Or is there a more fitting tool for deinterleaving?

Apart from HTSlib, you could use Picard's SamToFastq command to create 2 paired end fastq files from your bam file.

For example:

java  -jar <path_to>/SamToFastq.jar \
    INPUT=<Input_file.bam> \
    FASTQ=<output_pe1.fastq> \
    SECOND_END_FASTQ=<output_pe2.fastq>  \
    UNPAIRED_FASTQ=<output_up.fastq> \
    VALIDATION_STRINGENCY=SILENT \
ADD COMMENT
0
Entering edit mode

I encountered issue with this approach too (even using VALIDATION_STRINGENCY=SILENT), see here Picard error Illegal Mate State in converting BAM to Fastq Let me know if you have a solution for this. thanks

ADD REPLY
0
Entering edit mode

Hi, @aniketd86. I tried your code, the warning of "Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Found 52323 unpaired mates" did disappear, but I think it is probably due to the parameter "VALIDATION_STRINGENCY=SILENT", and the unpaired reads were still discarded because the output_up.fastq was empty, and the reads in output_pe1.fastq and output_pe2.fastq were same as my previous command (without the last two lines in yours).

ADD REPLY
0
Entering edit mode
9.5 years ago
piet ★ 1.8k

You may use bedtools for extracting paired reads from a BAM file. Bedtools is able to write out pairs into separate FASTQ files. Before extracting the reads, the BAM file must be sorted by read-name (samtools option -n):

samtools sort -m 6000000000 -n myfile.bam myfile_resorted
bedtools bamtofastq -i myfile_resorted.bam -fq reads_1.fastq -fq2 reads_2.fastq

See also:

ADD COMMENT
0
Entering edit mode

samtools sort -m 6000000000 -n myfile.bam myfile_resorted

What is the meaning of -m 6000000000.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6