Need help using Shrimp2 on paired end color-space SOLiD data.
2
0
Entering edit mode
10.0 years ago
Jordan ★ 1.3k

Hi,

I have SOLiD reads which are paried-end (75bp and 35bp) in .csfasta and .QV.qual format. I would like to use Shrimp2 to align them. So far I have been having trouble using it.

I used the following command:

gmapper -1 Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta -2 Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta $SCRATCH/human_hg19.fa -N 32 -p opp-in  > Sample.sam 2> Logs/Sample.log

This is my log file and the error is shown at the bottom. I'm not sure what that means.

- Processing genome file [/Refs/human_hg19.fa]
- Processing contig chr1
- Processing contig chr2
- Processing contig chr3
- Processing contig chr4
- Processing contig chr5
- Processing contig chr6
- Processing contig chr7
- Processing contig chr8
- Processing contig chr9
- Processing contig chr10
- Processing contig chr11
- Processing contig chr12
- Processing contig chr13
- Processing contig chr14
- Processing contig chr15
- Processing contig chr16
- Processing contig chr17
- Processing contig chr18
- Processing contig chr19
- Processing contig chr20
- Processing contig chr21
- Processing contig chr22
- Processing contig chrX
- Processing contig chrY
- Processing contig chrM

Loaded Genome
note: detected fastq format in input file [Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta]
- Processing read files [Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta , Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta]

note: quality value format not set explicitly; using PHRED+64
done r/hr r/core-hr
error: realloc failed: Success

Here is how my csfasta files look like:

@1_2_53_F3
T02.2031.2212.12.3.12.2.03.1030.3.10.3.1313.2323.3211.3102.1001..321..023..1
@1_2_193_F3
T12.0303.2132.00.3.10.2.21.1330.0.30.2.2220.0020.1002.0000..332..302..012..3
@1_2_264_F3
T31.1220.2112.30.0.20.1.12.3032.3.01.2.1132.1310.2100.1211.3302..310..202..1
@1_2_468_F3
T31.3221.1202.02.1.31.3.02.2000.0.20.0.2020.0022.2223.2222.2222..203..220..3

And this is how my qual file looks like:

>1_2_53_F3
23 31 -1 30 27 27 30 -1 31 26 27 26 -1 26 14 -1 23 -1 21 29 -1 17 -1 14 17 -1 17 17 14 14 -1 17 -1 14 14 -1 23 -1 29 21 17 14 -1 31 26 14 12 -1 14 14 23 14 -1 14 21 14 17 -1 21 14 17 17 -1 -1 14 14 26 -1 -1 14 14 29 -1 -1 14 
>1_2_193_F3
31 17 -1 14 23 30 31 -1 31 23 31 31 -1 14 31 -1 31 -1 14 14 -1 14 -1 29 17 -1 31 14 23 17 -1 31 -1 17 14 -1 27 -1 13 21 14 17 -1 17 24 12 30 -1 21 31 23 21 -1 23 14 31 31 -1 -1 21 23 17 -1 -1 14 31 17 -1 -1 9 9 17 -1 -1 14 
>1_2_264_F3
31 31 -1 31 31 31 31 -1 31 27 31 31 -1 31 31 -1 31 -1 31 26 -1 21 -1 30 27 -1 31 26 31 31 -1 31 -1 31 30 -1 31 -1 21 31 31 28 -1 31 31 31 23 -1 26 17 23 31 -1 17 20 30 27 -1 26 28 31 30 -1 -1 21 21 13 -1 -1 13 27 31 -1 -1 26 
>1_2_468_F3
31 31 -1 31 31 31 31 -1 31 31 21 31 -1 14 28 -1 28 -1 29 30 -1 27 -1 21 31 -1 13 31 31 25 -1 12 -1 23 30 -1 28 -1 26 32 12 21 -1 28 18 30 12 -1 28 31 27 15 -1 15 31 28 14 -1 31 26 26 28 -1 -1 23 23 14 -1 -1 23 12 21 -1 -1 31

Does anyone what the error is? I have never used Shrimp2 before, so struggling a bit.

Thanks for the help.

mapping shrimp2 solid paired-end • 3.4k views
ADD COMMENT
0
Entering edit mode

is it possible that the tool ran out of memory to allocate? If I recall it correctly it was quite the memory hog.

ADD REPLY
0
Entering edit mode

I'm not so sure. Each csfasta files is only about 5GB. And I gave a RAM of about 256GB and 32 cores to run this. Do you think it might need more RAM than that?

ADD REPLY
1
Entering edit mode
10.0 years ago
Jordan ★ 1.3k

Ok. I think I figured out the issue. Shrimp2 has separate aligners for line space and colorspace.

For colorspace, which is the data I have, I should use gmapper-cs

So my command now is actually:

gmapper-cs -1 Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta -2 Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta $SCRATCH/human_hg19.fa -N 32 -p opp-in > Sample.sam 2> Logs/Sample.log

This seems to work.

ADD COMMENT
0
Entering edit mode

Interesting, I thought the @ made the mapper work incorrectly. But right, you have to use it in color space mode. Though it is still strange that you have the @ symbol there.

ADD REPLY
0
Entering edit mode

I actually converted the @ to '>'. The other samples I have with me have '>', not '@'.

Even after the conversion, it did not work. So I looked at the examples they gave again and realized that for csfasta they used gmapper-cs. And voila it started working. I wish they had better documentation.

Though I'm not sure how this particular sample changed to @.

ADD REPLY
1
Entering edit mode
10.0 years ago

if you actually have 256GB of ram that that should not be an issue.

Wait I think I see the problem, why do the records in your csfasta file start with @ symbols? That does not seem right. They should be >

ADD COMMENT
0
Entering edit mode

I think your csfasta was turned into a csfastq at some point, then it was converted back to csfasta ... kind of crazy ...

ADD REPLY
0
Entering edit mode

Let me correct that. It's a bit weird how the csfasta format converted to csfastq.

I will re-run it and see what happens. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6