Scripting method to parse fastq diploid consensus sequence bins based on quality
0
0
Entering edit mode
8.0 years ago
memory_donk ▴ 360

Hi Biostars,

I need to write a script that will accept scaffolds from a diploid consensus sequence in fastq format like one generated by this command [1], break the sequence into non-overlapping bins of 100bp, and give a true or false output based on whether they were above some quality threshold.

Where I'm stumbling is finding an object-oriented module that nicely packages up accession of fastq sequences along with their quality scores. In BioPerl I'd have no trouble breaking a fastq sequence up into bins and doing something with them, but there doesn't seem to be a method for accessing a region of a fastq entry to get its quality.

I'm mostly comfortable with BioPerl and maybe could figure out BioPython if needed. Does anyone know of a module that does something to this effect?

[1] samtools mpileup -C50 -uf ref.fa aln.bam | bcftools view -c - | vcfutils.pl vcf2fq -d 10 -D 100 | gzip > diploid.fq.gz

fastq parsing bioperl biopython • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6