Biostar Beta. Not for public use.
samtools sorting and indexing
3
Entering edit mode
12 months ago
ggman • 80
United States

Hi friends,

I am attempting to sort my bam files that I obtained from my bowtie sam files. I am not indexing them appropriate according to this error I am receiving after creating my bam file.

random alignment retrieval only works for indexed BAM or CRAM files.

I understand I am suppose to index the file before sorting them.

    #creating the appropriate files
    samtools view -Sb sample.sam.pair > sample.pair
    samtools view -bt ~/bigdata/refgenome/genome.fa.fai - - | samtools sort sample.pair -o sample.pair.bam

 samtools view -Sb sample.sam.single > sample.single
 samtools view -bt ~/bigdata/refgenome/genome.fa.fai - - | samtools sort sample.single -o sample.single.bam

    #merge
    samtools merge sample.all.bam sample.pair.bam sample.single.bam -@ 2
    rm sample.pair sample.single

    #index the final bam
    samtools index sample.all.bam

Any help would be appreciated.

ADD COMMENTlink
12
Entering edit mode
15 months ago
John 12k
Germany

I think you're over-thinking things :)

You can only index BAM files on position, and only when the data is sorted by position to begin with (don't ask...) So to sort by position just do:

samtools sort my.sam > my_sorted.bam

Then index with

samtools index my_sorted.bam

It's as easy as that. If you want to merge the output files from bowtie do that as the very first step, because I don't think samtools performs any optimisations for merging sorted BAMs/SAMs. However, i'd also recommend against bowtie2 in favour of STAR or BWA-MEM, but that's just a personal preference at the end of the day.

ADD COMMENTlink
6
Entering edit mode

With the latest samtools that command should be samtools sort -o sorted.bam initial.bam.

ADD REPLYlink
0
Entering edit mode

Oh they changed the syntax to be explicit!? Finally :D

ADD REPLYlink
1
Entering edit mode

would this take into account my .fai file?

ADD REPLYlink
2
Entering edit mode

You are still over-thinking, the fasta and bam indexes are two separate and independent things - you don't need one to have the other.

Indexing allows for efficient data access and retrieval. The fasta index (.fai) is used to access and retrieve subsets of the fasta sequence, and the bam index (.bai) to access and retrieve subsets of the bam file.

ADD REPLYlink
1
Entering edit mode

Oh my goodness.... Thank you both for explaining this to me. I really appreciate it! I only keep talking about my .fai file because my PI left me some code that I could base it off of and it has it on there but I couldn't understand how it was implemented. Thank you.

ADD REPLYlink
0
Entering edit mode

You're very welcome - if you run into any more complications please don't hesitate to open another question :)

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1