Question

STAR errors out during alignment

0

Entering edit mode

2.9 years ago

Ram 44k

Hi,

I'm running STAR on a couple of FASTQ files (~115M reads). Here is my exact command:

STAR --genomeDir /path/to/GRCh38.p12 \
          --outFilterType BySJout \
          --outFilterMultimapNmax 20 \
          --alignSJoverhangMin 8 \
          --alignSJstitchMismatchNmax 5 -1 5 5 \
          --alignSJDBoverhangMin 10 \
          --outFilterMismatchNmax 999 \
          --outFilterMismatchNoverReadLmax 0.04 \
          --alignIntronMin 20 \
          --alignIntronMax 100000 \
          --alignMatesGapMax 100000 \
          --genomeLoad NoSharedMemory \
          --outSAMmapqUnique 60 \
          --outSAMmultNmax 1 \
          --outSAMstrandField intronMotif \
          --outSAMtype BAM SortedByCoordinate \
          --outReadsUnmapped None \
          --outFileNamePrefix ${sample_name}. \
          --outSAMattrRGline ID:GRPundef \
          --chimSegmentMin 12 \
          --chimJunctionOverhangMin 12 \
          --chimSegmentReadGapMax 3 \
          --chimMultimapNmax 10 \
          --chimMultimapScoreRange 10 \
          --chimNonchimScoreDropMin 10 \
          --chimOutJunctionFormat 1 \
          --quantMode GeneCounts \
          --twopassMode Basic \
          --peOverlapNbasesMin 12 \
          --peOverlapMMp 0.1 \
          --outWigType wiggle \
          --outWigStrand Stranded \
          --outWigNorm RPM \
          --limitBAMsortRAM 160000000000 \
          --outSAMunmapped Within \
          --readFilesCommand zcat \
          --readFilesIn  ${fq1} ${fq2} \
          --runThreadN 12

I'm giving this 128G of RAM on my cluster, and larger files have run successfully. This one always fails at the started mapping stage though, with the bad_alloc error. I don't understand how 128G of RAM would be insufficient. Even without the 12 threads parameter, I run into the same error.

Here is the STDOUT:

Jul 13 11:31:28 ..... started STAR run
Jul 13 11:31:28 ..... loading genome
Jul 13 11:33:30 ..... started 1st pass mapping
Jul 13 12:10:31 ..... finished 1st pass mapping
Jul 13 12:10:33 ..... inserting junctions into the genome indices
Jul 13 12:12:50 ..... started mapping

and the STDERR:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

How can I fix this and get STAR to run?

EDIT: I tried running the command on the first 1000 lines of the FASTQ files and it ran to completion. So it's a matter of resources - the command is accurate.

RNA-seq STAR alloc • 1.3k views

ADD COMMENT • link 2.9 years ago by Ram 44k

0

Entering edit mode

If you have to, you can split your fastqs, and align in smaller chunks. You can merge the bams after (or if all you want is counts, you might not need to)

ADD REPLY • link 2.9 years ago by swbarnes2 14k

0

Entering edit mode

I wonder why that would solve it, given that larger read sets were mapped fine. Another STAR run (also with --twopassMode Basic) went fine, so did an RSEM run that internally used STAR both with the same input FQs. There has to be some sort of odd hardware related problem IMO.

ADD REPLY • link 2.9 years ago by Ram 44k

0

Entering edit mode

If I ran 33% subsets of the files, how would I merge the wiggle/SJ/junction files generated from the smaller runs? Is that something as straightforward as merging BAMs or BEDs?

ADD REPLY • link 2.9 years ago by Ram 44k

0

Entering edit mode

As a follow up, swbarnes2 - the Chimeric.out.junction, the *.wig and SJ.out.tab files look like BED files. However, how would I combine the information from the ReadsPerGene.out.tab files? Do I just sum counts across the partial files?

ADD REPLY • link 2.9 years ago by Ram 44k