How to select two reads to be processed through STAR?
1
0
Entering edit mode
9.1 years ago

So as a sample command, I tried this below:

STAR --genomeDir /home/simfish/Desktop/RNAseq/ --readFilesIn SRR925687_1.fastq SRR925688_1.fastq --outFilterMultimapNmax 1 --outSAMstrandField intronMotif --sjdbGTFfile /home/simfish/Desktop/RNAseq/Homo_sapiens.GRCh38.76.gtf --outFileNamePrefix /home/simfish/Desktop/ --runThreadN 32

==

But this resulted in a fatal error described below:

EXITING because of FATAL ERROR: Read1 and Read2 are not consistent, reached the end of the one before the other one
SOLUTION: Check you your input files: they may be corrupted

Mar 12 19:45:21 ...... FATAL ERROR, exiting

==

Is there a way to check the reads for consistency prior to running STAR on them? I'm new to STAR and I'm trying the tutorial over at http://www.genefriends.org/RNAseqForDummies/ - there is an issue with the last step (wrt running STAR). I'd like to select a pair of reads that could produce meaningful interpretable results.

STAR • 4.6k views
ADD COMMENT
3
Entering edit mode

SRR925687_1.fastq and SRR925688_1.fastq are not paired end fastq files. You should align them separately. The aligner is assuming them to be part of the same pair and complaining as these files have different number of reads.

ADD REPLY
1
Entering edit mode
9.1 years ago
Varun Gupta ★ 1.3k

The 2 files you are using are from different samples and are not paired end for a single sample

Use it like this

STAR \
  --genomeDir /home/simfish/Desktop/RNAseq/ \
  --readFilesIn SRR925687_1.fastq,SRR925688_1.fastq \
  --outFilterMultimapNmax 1 \
  --outSAMstrandField intronMotif \
  --sjdbGTFfile /home/simfish/Desktop/RNAseq/Homo_sapiens.GRCh38.76.gtf \
  --outFileNamePrefix /home/simfish/Desktop/ \
  --runThreadN 32

or separately

STAR \
  --genomeDir /home/simfish/Desktop/RNAseq/ \
  --readFilesIn SRR925687_1.fastq \
  --outFilterMultimapNmax 1 \
  --outSAMstrandField intronMotif \
  --sjdbGTFfile /home/simfish/Desktop/RNAseq/Homo_sapiens.GRCh38.76.gtf \
  --outFileNamePrefix /home/simfish/Desktop/SRR925687_1 \
  --runThreadN 32

Let me know if this gives error

ADD COMMENT
0
Entering edit mode

Okay - thanks so much for the answer! Just wondering - what's the difference between aligning them separately (as you mentioned in your second quote) and aligning both of them at the same time (as you mentioned in your first quote)?

ADD REPLY
0
Entering edit mode

When the files are from different samples like SRR925687_1.fastq and SRR925688_1.fastq, you should align them separately. You can also align them together as my first command does but then you will have to separate it using comma, because these are 2 different samples. You only use space when the files are paired end for a particular sample. You can align them together if you want and create just one bam file, but since these 2 things represent different samples it is preferred to align them separately, otherwise align them together as the first command and then separate them later on using awk or a script. Hope that helps

ADD REPLY

Login before adding your answer.

Traffic: 1528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6