Stampy and malformed BAM files?
0
0
Entering edit mode
7.0 years ago
joreamayarom ▴ 140

I'm mapping reads in a BAM file to a genome using STAMPY. Apparently, some of my reads are severely malformed and STAMPY is issuing the following complain:

stampy: Mapping failed on input line 1742232 of file /path/to/reads/my_file.R2.fastq.gz: CCCFFFFFHHHGHHJJJJJGHIJJIIJGJI@HIIJJJAGHJJIHHHFFFDDBDDDDDDDCDDDDDDB?<@DBDDA>C:ACBAC?CCD>BDDCCHIGDDHEH:EE7@DED<C;;AD
stampy: Error: (FastQReader:) Sequence and quality lines have different lengths (98 and 115: AGGCAAACGAGCGTTCGGGTCACCTGATGGTGATCACCGCCGCTTACGACCCCGTGCAGCACCAGAGGAGCTACAGGTGTGTTGCCGGCCTTTGAGGT and CCCFFFFFHHHGHHJJJJJGHIJJIIJGJI@HIIJJJAGHJJIHHHFFFDDBDDDDDDDCDDDDDDB?<@DBDDA>C:ACBAC?CCD>BDDCCHIGDDHEH:EE7@DED<C;;AD)
stampy:  Traceback:
File "/Net/fs1/home/gerton/Progs/Mapper/stampy/Stampy/reader.py", line 273, in generator

I have tracked down some of the offending lines and they looks like this.

@ILLUMINA:276:C0D97ACXX:5:1101:2429:90560 2:N:0:ACAGTG
AGGCAAACGAGCGTTCGGGTCACCTGATGGTGATCACCGCCGCTTACGACCCCGTGCAGCACCAGAGGAGCTACAGGTGTGTTGCCGGCCTTTGAGGT
+
CCCFFFFFHHHGHHJJJJJGHIJJIIJGJI@HIIJJJAGHJJIHHHFFFDDBDDDDDDDCDDDDDDB?<@DBDDA>C:ACBAC?CCD>BDDCCHIGDDHEH:EE7@DED<C;;AD

The error message makes clear sense now. The HEH:EE7@DED<C;;AD sequence is hanging all over the place. My question is what could have generated this error? Could it be possible that my files go corrupted while they were being downloaded? Should I simply generate a script that clips this extra sequence.

stampy mapping bam • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2334 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6