What Do The Period . Symbols Mean In The Sequence Record Of A Fastq File
2
1
Entering edit mode
11.2 years ago
Wayne ★ 1.0k

Hello all, I have a funny issue with bam files where the 4th character in a lot of my reads shows up as a ".". I've never seen this before but its running havoc with my scripts. Does anyone know what causes this or what it means? Below is an example

example:

@DHT4KXP1:3:1101:2235:2028#0/1
GAA.TACTGCCAAGTCATCCGTGTCATTGCCCACACCCAGATGCGCCTGCTTCCTCTGCGCCAGAAGAAGGCCCACCTGATGGAGATCCAGGTGAACGGAG
+DHT4KXP1:3:1101:2235:2028#0/1
_a_BS\ccgggegihhgfiiighfghiihhhhhiiiihiihifhiiiiihihhihihhh[dgeeeebddd_aacc_acccccbcccccccccbbccccacc
bam mapping sequencing • 4.4k views
ADD COMMENT
1
Entering edit mode

You might want to get a geiger counter ;). Just to be sure, it is exclusive to the fourth position of the read? What is the provenance of the data?

ADD REPLY
1
Entering edit mode

Some versions of the SOLiD sequencer used to put in dots into the colorspace sequence whenever the quality was too low and was unable to call a color. Used to break all kinds of tools.

ADD REPLY
0
Entering edit mode

I was thinking this as well, except the Q = 33 assuming Sanger scaling. Odd.

ADD REPLY
0
Entering edit mode

It is not just SOLiD data, I used to see this in Illumina qseq files a few years ago (when read lengths were at 75-76 bp). This was very frustrating because most tools would just die assuming it was improperly formatted data, especially with these dots at the beginning of the sequence. My assumption was that it was just bases that could not be called so I trimmed them.

ADD REPLY
5
Entering edit mode
11.2 years ago

The dot, and the B as the quality score (Your qualities range fro 'B' to 'h', which is the older encoding scheme, where B is the worst quality )indicate that it's an unknown base. Use sed to change all the '.' to 'N'.

ADD COMMENT
3
Entering edit mode
11.2 years ago
JC 13k

My guess (as Istvan suggested before) is an incorrect calling but instead of N it put a . like in SOLiD pipeline, also Q = B is the lower value in Illumina 1.5+ calling pipeline. Do you know the technology/source?

ADD COMMENT

Login before adding your answer.

Traffic: 1891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6