Biostar Beta. Not for public use.
Convert BAM files to paired end FASTQ
0
Entering edit mode
15 months ago

I have a bunch of bam files aligned with paired-end fastqs and I need to convert them back to paired-end fastqs.

I am using "samtools fastq" for this purpose (after sorting bam files by name):

samtools fastq -1 output.pe_1.fastq -2 output.pe_2.fastq -s singleton.fastq input.bam

The problem is, I noticed that my fastq files have 2 different naming conventions:

  1. Case1:

read name in pe1: @UNC15-SN850_90:5:1101:1195:2138/1

read name in pe2: @UNC15-SN850_90:5:1101:1195:2138/2

  1. Case2:

read name in pe1: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 1:N:0:TTAGGC

read name in pe2: @UNC14-SN744:189:D09V4ACXX:1:1101:1202:1856 2:N:0:TTAGGC


For case 2, I get my paired-end fastq files correctly. However for case 1, all of the reads are pushed to the singleton.fastq and also samtools generate the 2 empty paired-end fastq files.

Is there a way I can smoothly run both cases correctly using "samtools fastq" or any other tools available?

ADD COMMENTlink
0
Entering edit mode

Is this TCGA data?

You can may be able to use reformat.sh from BBMap suite if all you need to do is to reformat the /1 id's to ones that contain :.

spaceslash=t            Put a space before the slash in addslash mode.
addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.
ADD REPLYlink
0
Entering edit mode

Hello,

could you please post complete example lines from the bam files, where one can see the read pairs?

Thanks!

fin swimmer

ADD REPLYlink
0
Entering edit mode
11 months ago
JC 7.9k
Mexico

The problem could be Samtools is expecting the second header format (including index). You can try bedtools or Picard for this.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1