Hello,
I am attempting to quantify RNA-seq single-end reads using RSEM on my locally assembled transcriptome. I began by preparing my reference transcriptome as follows:
rsem-prepare-reference oases_fpkm_1.fa transcriptome_index --bowtie2
Then, I proceeded to quantify my reads. I created the CSV file containing upstream read information with the following format:
ECC1-DC1_S1_merged_R1.fastq,ECC4-DC6_S2_merged_R1.fastq,ECC4-DR2_S3_merged_R1.fastq,ECC5-DC6_S4_merged_R1.fastq,ECC5-DR2-2_S5_merged_R1.fastq,ECR1-DC1_S6_merged_R1.fastq,ECR1-DR1_S7_merged_R1.fastq,ECR2-DC1_S8_merged_R1.fastq,ECR2-DR1-3_S9_merged_R1.fastq,ECR5-DC2_S10_merged_R1.fastq,ECR5-DR2-3_S11_merged_R1.fastq,ECC1-DR1_S12_merged_R1.fastq
Subsequently, I executed the quantification command:
rsem-calculate-expression upstream_read_file transcriptome_index gorpol_quant_rsem -p 4 --bowtie2
However, I encountered the following error:
Error: reads file does not appear to be a FASTQ file
Although the format appears correct to me, here is a sample of the file:
head ECC1-DC1_S1_merged_R1.fastq
@NB551202:176:HWFLCBGXK:1:11101:11390:1041 1:N:0:CGCAACTA
GTCAANGATGAAAAAAATATTATCANCAAGGCAATGNCCTNNANNNCNTNNTNNCNNGNNNNCTNCNTNNNGAANC
+
AAAAA#AEEEEEEEEEEEEEEEEEE#<EEEEEEEEE#EAE##E###A#E##E##E##A####EE#A#E###AA/#A
@NB551202:176:HWFLCBGXK:1:11101:1891:1042 1:N:0:CGCAACTA
CGCATNTATTAGCTCTAGAATTACTACGGTTATCCANGTANNAANNGAGNNCNNTNNANNNNACNANANNNGATNT
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE#AEE##EE##AEE##E##E##E####EE#E#E###EEE#E
@NB551202:176:HWFLCBGXK:1:11101:24292:1043 1:N:0:CGCAACTA
GCAAANCCTAGTTTTAAACGCTTTCTTTCGTTTCTTCCTCNNTTNNGGCNNCNNCNNCNNNNCANCNTNNNTGCNA
Note: Initially, my files were in the .fastq.gz format. However, I encountered the same error after and before unzipping them. I have searched extensively but have yet to find a solution to this issue.