Problems while reheadering BAM with Samtools 1.3.1
1
2
Entering edit mode
6.3 years ago
ognjen011 ▴ 250

We want to remove the alternative contigs from both reference fasta and aligned BAM. The procedure we chose for BAM is to use Samtools View to keep only alignments chr1-22 and X,Y; we then: * extracted the header with samtools view -H, * deleted all the unused contigs * renamed the .txt file to .sam * used reheader BAM according to instructions.

The contents of the sam used for reheader is given below:

@HD VN:1.5  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr15    LN:101991189
@SQ SN:chr16    LN:90338345
@SQ SN:chr17    LN:83257441
@SQ SN:chr18    LN:80373285
@SQ SN:chr19    LN:58617616
@SQ SN:chr20    LN:64444167
@SQ SN:chr21    LN:46709983
@SQ SN:chr22    LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@RG ID:1    PL:Illumina HiSeq   SM:Plasma
@PG ID:bwa  PN:bwa  CL:/opt/bwa-0.7.13/bwa mem -M -R @RG\tID:1\tPL:Illumina HiSeq\tSM:Plasma -t 8 Homo_sapiens_assembly38.fasta /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/ERR855951_SarcomacfDNA_1.fastq.gz /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/ERR855951_SarcomacfDNA_2.fastq.gz VN:0.7.13-r1126
@PG ID:SAMBLASTER   CL:samblaster -i /dev/stdin -o /dev/stdout -M   VN:0.1.22
@PG ID:GATK ApplyBQSR   VN:4.beta.2 CL:ApplyBQSR  --output ERR855951_SarcomacfDNA.recalibrated.bam --bqsr_recal_file /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/workspace/5ef73ca8-e545-4e5c-9b91-beb69b529d41/ctdna-align_WES___BWA___GATK_4_0__PLASMA_GATK_BaseRecalibrator/ERR855951_SarcomacfDNA.recal_data.grp --input /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/workspace/5ef73ca8-e545-4e5c-9b91-beb69b529d41/ctdna-align_WES___BWA___GATK_4_0__PLASMA_BWA_MEM_Bundle_0_7_13/ERR855951_SarcomacfDNA.bam --createOutputBamIndex true  --preserve_qscores_less_than 6 --useOriginalQualities false --quantize_quals 0 --round_down_quantized false --emit_original_quals false --globalQScorePrior -1.0 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false PN:GATK ApplyBQSR

After this any tool using the BAM fails, for example Samtools reports a truncated file. The only way I can access it if I used Samtools in the header-only mode, where again I get the same header. What could be the problem here?

Header BAM Samtools • 5.3k views
ADD COMMENT
0
Entering edit mode

Thanks for the answer. Is there a use-case where samtools reheader works correctly?

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment should have gone under @Devon's answer.

ADD REPLY
0
Entering edit mode

Sure, changing things like "chr1" to "1", or removing chromosomes from the end (or adding chromosomes there). I consider samtools reheader to be a very advanced function that largely requires the user be familiar with how BAM files are internally structured.

ADD REPLY
0
Entering edit mode

I am asking because we have tried removing chromosomes from the end, as far as I can tell. Your advice works, but I am trying to understand what went wrong.

ADD REPLY
2
Entering edit mode
6.3 years ago

Don't use samtools reheader to remove chromosomes/contigs that aren't at the very end of the header, doing so will just bork the file. What you want is instead something like:

samtools view -H foo.bam > header
# edit header...
samtools view foo.bam chr1 chr2 chr3 ... | samtools view -bo reheadered.bam -t header -

This will be slower than reheader, but actually work.

ADD COMMENT
0
Entering edit mode

Note that in samtools v1.1 (and probably higher) the file given to the -t option (header) must have a specific format (see the man page), so just using the output of samtools view -H won't work.

ADD REPLY

Login before adding your answer.

Traffic: 2447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6