Cutadapt: error: Line 1 in FASTQ file is expected to start with...?
1
0
Entering edit mode
5.2 years ago
cs1878 • 0

Hello all,

Firstly, thank you for taking the time to read my question.

TL;DR: cutadapt error: expects @ in first line, sees different string in file.

I recently recieved my illumina MiSeq data and I am trying to begin my analysis. It is ITS seq data and I am trying to follow the DADA2 ITS specific pipeline (https://benjjneb.github.io/dada2/ITS_workflow.html).

My data was provided as demultiplexed fastq.gz files. I have unzipped them using WinZip and a gunzip command from the Rutils package in R (in case this is relevant later). This was incase my issues was a legacy from an unzipping error.

I have some reverse complement primers in my data that I want to use cutadapt to remove, as per the turotial linked above. I installed cutadapt and it successfully loads into Rstudio on Windows using system2. I have cutadapt 1.8 installed.

However, when I run cutadapt I get the following error message:

" cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0b' "

The head of my fastq files looks like this:

@M04428:324:000000000-C2DCL:1:2107:9655:1814 1:N:0:TAAGGCGA+CTCTCTAT

I have found some similar posts on this topic, including: Error Cutadapt: FASTQ file is expected to start with '@ However, the solution here was to reconvert the .sra files to fastq. However, I am only provided with the .gz files.

I am of some understanding that this might be some language issues that python is expecting some unix language but it is reading some dos format at the start of the file? But I know very little of this and this is just what I have picked up while googling.

Feeling that this may have been a windows issue, I have tried to run cutadapt on Ubuntu 16.04 on the same files using the command:

cutadapt -g GCATCGATGAAGAACGCAGC -a GCTGCGTTCTTCATCGATGC -G TCCTCCGCTTAT TGATATGC -A GCATATCAATAAGCGGAGGA -n 2 -o cut_1_1.fastq -p cut1_2.fastq 1_1.fastq 1_2.fastq

However, I receive the exact same error message.

I am at a loss at what to try next. Some of my files have very few reverse complement primers in them (<0.001% seqs). However, some individual files are much higher. I am interested in the rarer sequences in my files, so do not want to just exclude the information. Further - I will likely recieve more files like this in the future, so this is of great interest to me to solve this issue.

Any information or help you can provide would be beyond helpful.

Best, Chris

next-gen cutadapt fastq • 3.5k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
ATpoint 82k

Output of head 1_1.fastq ? There is also no need to decompress fastq files. Standard downstream tools can read gzipped files. I would run the cutadapt again with the original compressed files. See if that works, and do yourself a favor and avoid Windows at all costs when dealing with sequencing data. Most downstream tools are Unix-based. Using Linux tools via Rstudio in a Windows environment (if I got you correctly) adds unnecassary trouble for simple standard tasks. Either get a Linux partition or run via a virtual environment. In any case better avoid R and use the shell. If you are unfamiliar, I highly recommend investing time to learn Unix scripting and terminal usage, as you'll need this like every day.

ADD COMMENT
0
Entering edit mode

ATpoint, many thanks for the reply. You are correct that I am running the cutadapt via RStudio in a Windows environment. I do have a linux partition, which I have been working on getting used to - I switched back onto that, reran the pipeline on .gz files and cutadapt is now running successfully. I was worried gzipped would cause issues downstream - it appears completely the opposite was true.

FWIW, the output of head 1_1.fastq was a mess of unintelligible symbols, that I would not be able to post without a screenshot.

Thanks for your help and have a great weekend.

ADD REPLY

Login before adding your answer.

Traffic: 2076 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6