Extracting tlen From BAM File Using htsjdk
1
1
Entering edit mode
7.5 years ago

Hello Biostars Community,

I am a newbie to the bioinformatics world and need some help with a project I am working on!

I need to use htsjdk to extract every tlen value from a BAM file and have a few questions.

  • How do a read the BAM file in the first place? I have found SamReader that might do the job like so : SamReader samReader= SamReaderFactory.makeDefault().open(new File("mybam.bam"));
  • How do I find the position of individual chromosomes? I want to multithread the application, reading each chromosome in parallel.
  • Once I get the position to start reading from, how do I read each line, then the tlen field?

Sorry for all of the questions, I have been googling a lot and haven't found answers!

Any help is appreciated, thanks in advance!

htsjdk bam sam samtools tlen • 3.2k views
ADD COMMENT
2
Entering edit mode
7.5 years ago

This may get you in the right direction:

// Open bam file
SamReader samReader= SamReaderFactory.
        makeDefault().
        validationStringency(ValidationStringency.SILENT).
        open(new File("aln.bam"));

// Start iterating from start to end of chr7. 
SAMRecordIterator iter = samReader.query("chr7", 0, 0, false);

while(iter.hasNext()){
    // Iterate thorough each record and extract fragment size
    SAMRecord rec= iter.next();
    int tlen= rec.getInferredInsertSize();
    System.out.println(tlen);
}

Get list of chromosomes/contig names from SamReader object:

List<SAMSequenceRecord> refSeqs= samReader.getFileHeader().getSequenceDictionary().getSequences();
for(SAMSequenceRecord ref : refSeqs){
    System.out.println(ref.getSequenceName());
}
ADD COMMENT
0
Entering edit mode

Ah thank you so much! getInferredInsertSize is what I was looking for, much appreciated!

Might you know how to start iterating from the start of a particular chromosome?

ADD REPLY
1
Entering edit mode

Hi- See edit. Have a look at the htsjdk docs for the meaning of the various method. Also make sure ValidationStringency.SILENT is ok for you.

ADD REPLY
0
Entering edit mode

Much appreciated. One more question, is there a function that returns the list of chromosomes/contigs so I can dynamically build that list?

ADD REPLY
1
Entering edit mode

Hi again - See edit, I haven't tested it but it should be about right.

ADD REPLY
0
Entering edit mode

The contig names do indeed work thank you. However, I am running into an issue with reading from a specific one using the query method.

I get this error: htsjdk.samtools.SAMException: No index is available for this BAM file. How do I provide the index file?

ADD REPLY
0
Entering edit mode

You can make the index with e.g. command line samtools samtools index aln.bam, the bam file needs to be sorted by position in order to be indexed (samtools sort aln.bam > sorted.bam). You can also sort and index within Java/htsdjk (I would have to look this up though.)

ADD REPLY
0
Entering edit mode

l'm using SamReader query method, but it doesn't work. Nothing print. l was wondering if you have this problem. My code:

SamReader reader=SamReaderFactory.makeDefault().validationStringency(ValidationStringency.SILENT).open(new File(in_bam_file));
        SAMRecordIterator recordQuery=reader.query("chr3",179198825, 179199177,false);
        while(recordQuery.hasNext()){
            SAMRecord res=recordQuery.next();
            System.out.println("query size is:\t"+res.getInferredInsertSize());
        }
ADD REPLY
0
Entering edit mode

hi, l use SamReader query method, it doesn't work, l don't know why. My htsjdk version is 2.8.1.

SamReader reader=SamReaderFactory.makeDefault().validationStringency(ValidationStringency.SILENT).open(new File(in_bam_file));
        SAMRecordIterator recordQuery=reader.query("chr3",179198825, 179199177,false);
        while(recordQuery.hasNext()){
            SAMRecord res=recordQuery.next();
            System.out.println("query size is:\t"+res.getInferredInsertSize());
        }
ADD REPLY
0
Entering edit mode

Are you sure the interval chr3:179198825-179199177 contains some reads? Make sure also that "chr3" is correct, maybe your chromosome is named "3". What is the output of samtools view in.bam chr3:179198825-179199177?

ADD REPLY
0
Entering edit mode

l am pretty sure, 'chr3' is right and the interval is also right. 'samtools view in.bam chr3:179198825-179199177' could print the results.

ADD REPLY

Login before adding your answer.

Traffic: 2723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6