Help with Illumnina Headers
2
0
Entering edit mode
6.5 years ago
landrjos ▴ 20

Please give me a lecture on illumina headers....

I have been down loading NGS ChIP-Seq data to run comparative studies on our ChIp-Seq data. I have noticed that the headers on the data I have been down loading from NCBI is very different from our header. For example here is the header for a run from the united states sequence read archive (SRR) SRR1747943…. Notice that the header has the SRR number in it.

@SRR1747943.1.1 1 length=36
CTATTAAGTGACCTGAGTGGCAGGAAGAAGTAGCGC
+SRR1747943.1.1 1 length=36
HHHHHHHHHHHHHHHHHHHHGEGG############

Here is an example of the header from one of our runs. It has the standard header information including the sequencer identifier, etc.., etc..

@HWI-ST425:160:D1JFWACXX:3:1101:1247:1946 1:N:0:GCCAAN
NAAACTCCTTCATGAAGCTGATACAAGATGTCATGAATTGTNTTGCATCTGNNNATCTTCTGAGNNNNNNNNNNNAAAAGCATCACATTNNNNNNNNCCTT
+
.#4=DDFFFHHHHHJJJIIJJJJJJJJJIIIHHIJJJIIJJJ#1?FGHIHJI###00-BFGIGGG###########,,5;A>CD;@CDDC############

My question is was the header changed at SRR when the data was deposited, or was the header changed by the person who deposited the sequence? Is there some way I have find the original header for the SRR1747943 run?

sequence next-gen sequencing • 1.3k views
ADD COMMENT
1
Entering edit mode
6.5 years ago
Ram 43k

Not here to give you a "lecture" - It looks like the repository re-headers submissions anonymizing everything in the process.

ADD COMMENT
0
Entering edit mode
6.5 years ago
Charles Plessy ★ 2.9k

See the option -F | --origfmt of fastq-dump to get the original sequence names.

ADD COMMENT

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6