Salmon transcriptome generation
1
0
Entering edit mode
5.1 years ago
Morris_Chair ▴ 350

Dear all,

I want to count the reasds from RNA seq by using SALMON tool. I was told that the first thing to do is to create a fasta file with the sequence information (fasta file from reference genome + annotation file.GTF) such as :

(salmon) [@ws7910 RNAseq]$ gffread -w transcripts.fa -g Homo_sapiens.GRCh38.cdna.all.fa Homo_sapiens.GRCh37.75.gtf  

I get this result

No fasta index found for Homo_sapiens.GRCh38.cdna.all.fa. 
Rebuilding, please wait..
Fasta index rebuilt.
Error creating file: annotation/transcript/transcripts.fa

It creates a file named Homo_sapiens.GRCh38.cdna.all.fa.fai

On the salmon website seems that this step is not necessary so can you please tell me what am I doing wrong? and what is the best way to start using salmon?

I'd prefer to do the counting with BAM files already alignes using TopHat2

Thank you

RNA-Seq • 7.4k views
ADD COMMENT
5
Entering edit mode
5.1 years ago
Rob 6.5k

Hi Morris,

The easiest way to start using salmon is to download the transcriptome file directly (e.g. from Gencode --- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.transcripts.fa.gz). Then, you can create a salmon index via:

salmon index -t gencode.v29.transcripts.fa.gz -i gencode_v29_idx

and then you can quantify using gencode_v29_idx as the salmon index.

ADD COMMENT
0
Entering edit mode

Hi Rob, than you for you answer, why you omit the K-mer size for the index generation? -K

thanks

ADD REPLY
0
Entering edit mode

Hi Morris,

The -k argument has a default value (31) that is used if -k is not provided. I you wish to use the default, you don't have to pass that option explicitly. If you want to use another value of k, then you can pass that to the index command.

ADD REPLY
1
Entering edit mode

As a rule of thumb in my experience, the only really crucial thing is that the k-mer length is longer or equal to the read length. k-mer length only has a minor influence on the mapping rate, see Salmon Quantification for RNA-seq Read Pairs with Different Lengths to get a vague idea. Most important factor, given that the library is of high quality without contaminations is the read length which eventually comes down to a proper experimental design. When indexing GENCODE files also mind passing the --gencode option to salmon.

ADD REPLY
0
Entering edit mode

I created the salmon index as you describe and it's ok. I tell you in advance that I'm newe to this and I'm lost for making the code to quantify the reads using gencode_v29_idx.

I have 4 bam files to quantify and I don't know how to formulate the code First, can I run multiple bam files? (two treated and two control for instance) or I'll have to do one by one? I thought that the structure of the code should bee like this but I don't know how to insert the gencode_v29_idx

Salmon quant  --libType A -p 4 --validate mapping --seqBias --gcBias -a 1file.bam 2fie.bam 3.file.bam 4fie.bam -o salmon_quant

thank you Rob

ADD REPLY
0
Entering edit mode

probably it's more covenient using fastq filles because there are not others bias

ADD REPLY

Login before adding your answer.

Traffic: 2891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6