fastq to fastq.gz conversion for using HISAT2
1
0
Entering edit mode
6.6 years ago

Hello all!

I'm trying to use HISAT to read through some fastq data though the Protocol requires the input be in fastq.gz format though all of my files are in .fastqc format.

How do I go about converting these from .fastqc to fastq.gz?

Any thoughts appreciated - thank you!

fastq RNA-Seq • 12k views
ADD COMMENT
1
Entering edit mode

Please post the first 4-10 lines of your ".fastqc" file. I assume it's really fastq.

ADD REPLY
1
Entering edit mode

Oh, yes I think you're right. Here are the first lines:

@SRR3532922.1 HWI-D00269:94:C5MB2ANXX:6:1101:1165:1963 length=125
CAGAACCACTTGGGTTGTTATGGAATGAAAAGTTTGTGCAACTTTCTGAAGCAGTGGAAATCTGCAACATGCAATGTAGGCCAAAAAGCGCTCAGGACTGGGTAGCTTTTCTGAATGTGAGATC
+SRR3532922.1 HWI-D00269:94:C5MB2ANXX:6:1101:1165:1963 length=125
<<ABGG0F=?1@1C///EGGFEEGCGEGGGC>FFG>F11FEGG@GFGFGGFGGG=1:FGGGGGGFGFGGGGGGGGGGGGGGEGE>EGG>GG<FEB<CFGEDF>8F<90E8C@DCFGDGD@FGGG
@SRR3532922.2 HWI-D00269:94:C5MB2ANXX:6:1101:1309:1970 length=125
TTTCCAGCACCCCCACAGTGTCTCAAACTGCCTCTAACTCCAGTTGCCAGGAATCAGAGGCCCTCCTCTATCTCCTAGAGCACGTGGTACTTACTGTGGTGCACGGATGTAGCCACATGCAAGA
+SRR3532922.2 HWI-D00269:94:C5MB2ANXX:6:1101:1309:1970 length=125
:<A<01?1?10/=//<<FB11>1:<11C1:1E1B:11::F>1>111:>FG0=111=:E1:/0=B0=1=111:1>@1111=E:F/0:/00=000?=<0000<BFG0/<F.0=00<0<00<<00;D
@SRR3532922.3 HWI-D00269:94:C5MB2ANXX:6:1101:1428:1977 length=125
TTTTTGGTGTCCCGGGCAATGGATAATACTGAGCCTCTTTGGCAAAAAGAGGGCTTTCAGCAAGGCTAGGATCTCGCTTTTGTTCTTGATTTCCTTGCCTTCTGAGGTCAGCAACCCCCGTCT
+SRR3532922.3 HWI-D00269:94:C5MB2ANXX:6:1101:1428:1977 length=125
<3<ACC/CCGGGGGGFDE=EFG1EFGGGGEGGDGGEGGGGEC1@DF@G<FGGGGGGGGFGGGGGGGGEGG1CCFGGGGGGGGGGGGGGGCFF>GGGCDF>G0FF>0:FC;DGGDGGGGGBGBD
@SRR3532922.4 HWI-D00269:94:C5MB2ANXX:6:1101:1661:1987 length=125
CTTTCTACTAAGTATAGAGAATACTACTAGGTTAGATTAAAACTCTTCACTATGGCAAAAGAAAACAACATTAAAACCACCAAAACGAACT
ADD REPLY
0
Entering edit mode

Just provide that file as it is, you don't need to do anything to it.

ADD REPLY
0
Entering edit mode

I don't think HISAT2 requires files to be compressed - it can handle regular fastq files just fine.

ADD REPLY
0
Entering edit mode

For your information, a .gz file is compressed. This means that the file is binary encoded and as such takes up less space on your hard drive. So definitely for big datasets such as sequencing data that is beneficial. For most tools, it's not necessary, but it can save you some money on storage. See this answer below for how to compress a file using the gzip program.

ADD REPLY
4
Entering edit mode
6.6 years ago
gzip <input.fastqc>
ADD COMMENT

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6