Entering edit mode
2.9 years ago
Ram
44k
Hi,
I'm running STAR on a couple of FASTQ files (~115M reads). Here is my exact command:
STAR --genomeDir /path/to/GRCh38.p12 \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--alignSJoverhangMin 8 \
--alignSJstitchMismatchNmax 5 -1 5 5 \
--alignSJDBoverhangMin 10 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 100000 \
--alignMatesGapMax 100000 \
--genomeLoad NoSharedMemory \
--outSAMmapqUnique 60 \
--outSAMmultNmax 1 \
--outSAMstrandField intronMotif \
--outSAMtype BAM SortedByCoordinate \
--outReadsUnmapped None \
--outFileNamePrefix ${sample_name}. \
--outSAMattrRGline ID:GRPundef \
--chimSegmentMin 12 \
--chimJunctionOverhangMin 12 \
--chimSegmentReadGapMax 3 \
--chimMultimapNmax 10 \
--chimMultimapScoreRange 10 \
--chimNonchimScoreDropMin 10 \
--chimOutJunctionFormat 1 \
--quantMode GeneCounts \
--twopassMode Basic \
--peOverlapNbasesMin 12 \
--peOverlapMMp 0.1 \
--outWigType wiggle \
--outWigStrand Stranded \
--outWigNorm RPM \
--limitBAMsortRAM 160000000000 \
--outSAMunmapped Within \
--readFilesCommand zcat \
--readFilesIn ${fq1} ${fq2} \
--runThreadN 12
I'm giving this 128G of RAM on my cluster, and larger files have run successfully. This one always fails at the started mapping
stage though, with the bad_alloc
error. I don't understand how 128G of RAM would be insufficient. Even without the 12 threads parameter, I run into the same error.
Here is the STDOUT:
Jul 13 11:31:28 ..... started STAR run
Jul 13 11:31:28 ..... loading genome
Jul 13 11:33:30 ..... started 1st pass mapping
Jul 13 12:10:31 ..... finished 1st pass mapping
Jul 13 12:10:33 ..... inserting junctions into the genome indices
Jul 13 12:12:50 ..... started mapping
and the STDERR:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
How can I fix this and get STAR to run?
EDIT: I tried running the command on the first 1000 lines of the FASTQ files and it ran to completion. So it's a matter of resources - the command is accurate.
If you have to, you can split your fastqs, and align in smaller chunks. You can merge the bams after (or if all you want is counts, you might not need to)
I wonder why that would solve it, given that larger read sets were mapped fine. Another STAR run (also with
--twopassMode Basic
) went fine, so did an RSEM run that internally used STAR both with the same input FQs. There has to be some sort of odd hardware related problem IMO.If I ran 33% subsets of the files, how would I merge the wiggle/SJ/junction files generated from the smaller runs? Is that something as straightforward as merging BAMs or BEDs?
As a follow up, swbarnes2 - the
Chimeric.out.junction
, the*.wig
andSJ.out.tab
files look like BED files. However, how would I combine the information from theReadsPerGene.out.tab
files? Do I just sum counts across the partial files?