"Invalid BAM file header: missing sequence name in file" when trying to visualise .bam file in IGV
2
0
Entering edit mode
5.8 years ago
francois ▴ 80

I need help with visualising a .bam file.

So, what I did up to now was:

I had one .fastq of 3.9 MB file containing many reads generated by Metrichor (I am sequencing using the Nanopore MinION). I aligned the reads to a reference .fasta sequence using the package ngmlr. I got a .sam file of 4.3 MB. The command was:

ngmlr -t 4 -r ref.fasta -q reads.fastq -o test.sam -x ont

I converted this .sam file to a .bam file using samtools. I got a .bam file of 2 MB. The command I used was:

samtools view -bT ref.fasta test.sam > test.bam

I sorted the .bam file with samtools. The command was:

samtools sort test.bam > test_sorted.bam

I indexed the .bam file using samtools. I got a .bai file of the same name of 96 bytes. The command was:

 samtools index test_sorted.bam

Now I tried opening the .bam file in IGV_2.4.10. But I received the following error message:

Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file

What did I do wrong?

alignment bam igv • 5.1k views
ADD COMMENT
0
Entering edit mode

Ah that would have been great! But same problem... Any other ideas?

The first few lines of my .sam file (just before the conversion to .bam step) looks like that:

@HD VN:1.0 SO:unsorted

@SQ SN: LN:877

@PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont

01c372ae-0d41-4ba4-bd0c-c56498801f53 4 * 0 0 * TGGGACGCTCGTTCAGTTA

Does that look OK to you?

ADD REPLY
0
Entering edit mode

Ok so I tried adding something in the @SQ SN: header of the .sam file, as it seems like this is the "sequence name". But it does not look like the conversion to .bam worked fine. I have this message:

[W::sam_parse1] urecognized reference name; treated as unmapped

Exactly the same number of times as my number of aligned reads, which does not sound very good.

If I continue anyway with sorting and indexing, the error message from IGV is:

does not contain any sequence names which match the current genome.

File:      PRNP, # that's the name I added

Genome: chr1, chr2, chr3, chr4, ...

I also tried adding chr20 in the .sam file header, exact same.

So I guess the problem does come from there, but I am not sure how I should name the sequence... Also, to be clear, my reference sequence in .fasta I am aligning to is a very short sequence (around 1kb), not a full genome or anything. Is that OK to do?

Thanks

ADD REPLY
1
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

You should avoid manual tampering with bams.

ADD REPLY
1
Entering edit mode

I also tried adding chr20 in the .sam file header, exact same.

This won't work, because each mapped read will still have a wrong chromosome name. But i wonder how chromosome names don't match, as apparently you used the same reference at all steps. Did you load the same reference genome on IGV before opening the bam file?

Also, what are the outputs of:

grep ">" reference.fasta | head

And

samtools view test.sam | head
ADD REPLY
0
Entering edit mode

I created this reference myself, so it is whatever I called it:

grep ">" ref.fasta | head

106_46_amplicons_ref

After that, the .fasta reference line is just a one line sequence of ~ 1k characters in lower case.

And

samtools view align.sam | head

37d3e25f-875a-4cac-967a-b26d51342613 4 * 0 0 * * 0 0 CACGAAAAAAAA

And many many lines of alignments. It is not giving the header for some reason, is that what you expected? Just opening align.sam in a text editor, I can see the actual header, which looks like:

@HD VN:1.0 SO:unsorted @SQ SN: LN:877 @PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont

(Notice that there is nothing after SN:).

Just so I summarise the sequence of commands I am doing right now:

  • ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont
  • samtools view align.sam -o align.bam
  • samtools sort align.bam > sort.bam
  • samtools index sort.bam
  • Changing the reference genome in IGV with Genomes > Load genome from file. And I select ref.fasta.
  • Drag and drop sort.bam in IGV.
  • Error message: Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file

Does the name of the reference need to match something in the .sam file by any chance?

ADD REPLY
2
Entering edit mode
5.8 years ago

You can simplify your commands substantially and avoid intermediate files:

zcat reads.fastq.gz | ngmlr --presets ont -t 8 -r genome.fa | samtools sort  -o alignment.bam -

Which should work okay in IGV.

ADD COMMENT
0
Entering edit mode

True. I don't really know how that would help solve my problem though, it is the same as running the commands sequentially.

ADD REPLY
0
Entering edit mode

It's not the same as your commands, since you apparently messed with the header. Anyway, I use that command often and it works fine for me, also in igv.

ADD REPLY
1
Entering edit mode
5.8 years ago
h.mon 35k

I believe you need samtools view -bhT ref.fasta test.sam > test.bam, otherwise the sam header is lost.

-h include header in SAM output

ADD COMMENT
1
Entering edit mode

You only need:

samtools view test.sam -o test.bam

(assuming a reasonably recent version of samtools)

ADD REPLY
0
Entering edit mode

No, -b implies -h, so this cannot be the issue.

ADD REPLY

Login before adding your answer.

Traffic: 1990 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6