Seqan read compressed stream
0
0
Entering edit mode
6.4 years ago
pmarijon ▴ 140

Hi,

I want read a sequence file (fasta fastq bam, etc), so I read Seqan tutorial. But If I want know my position in file I need use std::ifstream (for generate a progress bar) , it's not a problem, I write this test code:

#include <iostream>
#include <fstream>

#include <seqan/seq_io.h>


int main (int argc, char ** argv) {
    std::streampos begin,end;
    std::ifstream myfile (argv[1], std::ios::in | std::ios::binary);

    begin = myfile.tellg();

    seqan::SeqFileIn seq_file(myfile);
    seqan::CharString id;
    seqan::Dna5String seq;
    seqan::CharString qual;

    while(!seqan::atEnd(seq_file))
    {
    seqan::readRecord(id, seq, qual, seq_file);
    std::cout<<"pos: "<<myfile.tellg()<<" id "<<id<<std::endl;
    }

    end = myfile.tellg();

    myfile.close();

    std::cout << "begin: "<< begin << " end: "<< end << std::endl;
    std::cout << "size is: " << (end-begin) << " bytes.\n"<<std::endl;
    return 0;
}

But when I try this code on compressed fastq read, Seqan throw an exception terminate called after throwing an instance of 'seqan::ParseError'

My question :

  • Use std::ifstream is the only solution to get the current position in file ?
  • How I can say to Seqan this stream are a compressed stream ?
  • Can I generate an uncompressed stream from my compressed stream (with SeqAn or zlib)

Thanks.

seqan • 1.9k views
ADD COMMENT
1
Entering edit mode

why would you want to know the position of a fastq record in a compressed file ? unless you're using bgzf, there is no way to 'fseek ' a bgzip file...

ADD REPLY
0
Entering edit mode

I want generate a progress bar, the post required an edit. For compress file we can have a good approximation with size of compressed file and the position in compressed file.

ADD REPLY
0
Entering edit mode

then I would create a custom std::streambuf to count the number of bytes... e.g: https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/

ADD REPLY
0
Entering edit mode

I use a std::ifstream to get current position in file during seqan parsing, it's easy. But when I try my code on compressed file, seqan parsing failed. So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

ADD REPLY
0
Entering edit mode

So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

Usually it is the other way round: Things don't work on compressed data, unless documented.

ADD REPLY
0
Entering edit mode

Is documented

These classes provide an API for accessing sequence files in different file formats, either compressed or uncompressed.

Source : https://seqan.readthedocs.io/en/master/Tutorial/InputOutput/SequenceIO.html

ADD REPLY
0
Entering edit mode

Well, there is compressed .bam and compressed .gz.

ADD REPLY

Login before adding your answer.

Traffic: 2534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6