Salmon tool giving an error about the library type
1
0
Entering edit mode
5.9 years ago
Vasu ▴ 770

Hi,

I have RNA-seq samples (human organism) generated through ribo depletion kit. Initially I checked the library type of the samples using RSEQc. It is reverse forward. So, in the alignment with Hisat2 I used --rna-strandness RF which is -fr-firststrand in Tophat.

I'm trying to use salmon on the same samples with library type -l ISR based on their manual salmon librarytype

This is the command I used:

salmon quant -i index/ -l ISR -1 AT.1.fastq.gz -2 AT.2.fastq.gz -o transcripts_quant

When I checked the output file with mapping information I see like following in the end of the file:

ESC[1m[2018-05-23 23:02:18.809] [jointLog] [info] Computed 333657 rich equivalence classes for further processing
ESC[00mESC[1m[2018-05-23 23:02:18.809] [jointLog] [info] Counted 27089612 total reads in the equivalence classes 
ESC[00mESC[33mESC[1m[2018-05-23 23:02:18.823] [jointLog] [warning] 0.0175308% of fragments were shorter than the k used to build the index (31).
If this fraction is too large, consider re-building the index with a smaller k.
The minimum read size found was 20.


ESC[00mESC[1m[2018-05-23 23:02:18.823] [jointLog] [info] Mapping rate = 28.9152%

ESC[00mESC[1m[2018-05-23 23:02:18.823] [jointLog] [info] finished quantifyLibrary()
ESC[00mESC[1m[2018-05-23 23:02:18.825] [jointLog] [info] Starting optimizer
ESC[00mESC[1m[2018-05-23 23:02:24.405] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
ESC[00mESC[1m[2018-05-23 23:02:24.423] [jointLog] [info] iteration = 0 | max rel diff. = 48.1542
ESC[00mESC[1m[2018-05-23 23:02:25.913] [jointLog] [info] iteration = 100 | max rel diff. = 0.0934775
ESC[00mESC[1m[2018-05-23 23:02:27.400] [jointLog] [info] iteration = 200 | max rel diff. = 0.0553936
ESC[00mESC[1m[2018-05-23 23:02:28.846] [jointLog] [info] iteration = 300 | max rel diff. = 0.0348972
ESC[00mESC[1m[2018-05-23 23:02:30.357] [jointLog] [info] iteration = 400 | max rel diff. = 0.0276639
ESC[00mESC[1m[2018-05-23 23:02:31.834] [jointLog] [info] iteration = 500 | max rel diff. = 0.0228071
ESC[00mESC[1m[2018-05-23 23:02:33.341] [jointLog] [info] iteration = 600 | max rel diff. = 0.0191266
ESC[00mESC[1m[2018-05-23 23:02:34.779] [jointLog] [info] iteration = 700 | max rel diff. = 0.0171199
ESC[00mESC[1m[2018-05-23 23:02:36.308] [jointLog] [info] iteration = 800 | max rel diff. = 0.0134323
ESC[00mESC[1m[2018-05-23 23:02:37.754] [jointLog] [info] iteration = 900 | max rel diff. = 0.0129089
ESC[00mESC[1m[2018-05-23 23:02:39.248] [jointLog] [info] iteration = 1000 | max rel diff. = 0.0108738
ESC[00mESC[1m[2018-05-23 23:02:40.756] [jointLog] [info] iteration = 1100 | max rel diff. = 0.010454
ESC[00mESC[1m[2018-05-23 23:02:41.058] [jointLog] [info] iteration = 1122 | max rel diff. = 0.00969727
ESC[00mESC[1m[2018-05-23 23:02:41.080] [jointLog] [info] Finished optimizer
ESC[00mESC[1m[2018-05-23 23:02:41.080] [jointLog] [info] writing output 

ESC[00mESC[33mESC[1m[2018-05-23 23:02:41.518] [jointLog] [warning] NOTE: Read Lib [( AT.1.fastq.gz, AT.2.fastq.gz )] :

Greater than 5% of the fragments disagreed with the provided library type; check the file: transcripts_quant/lib_format_counts.json for details

As you see in the end it is saying Greater than 5% of the fragments disagreed with the provided library type Then I also looked into lib_format_counts.json file.

This is what I saw in .json file:

{
    "read_files": "( AT.1.fastq.gz, AT.2.fastq.gz )",
    "expected_format": "ISR",
    "compatible_fragment_ratio": 0.8183487087227385,
    "num_compatible_fragments": 22168749,
    "num_assigned_fragments": 27089612,
    "num_consistent_mappings": 83629771,
    "num_inconsistent_mappings": 11396640,
    "MSF": 0,
    "OSF": 27232,
    "ISF": 4075759,
    "MSR": 0,
    "OSR": 73061,
    "ISR": 83629771,
    "SF": 2794681,
    "SR": 4423463,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}

1) What is the problem here with library type?

2) The overall alignment rate for this sample with hisat2 is 91% and here I see mapping rate is 28%. Why is that difference?

RNA-Seq salmon alignment rna library • 1.6k views
ADD COMMENT
0
Entering edit mode
5.9 years ago
GenoMax 141k

See @Rob's answer: SALMON's warning library type

ADD COMMENT

Login before adding your answer.

Traffic: 1542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6