Help for finding the right FASTA file for kallisto
3
1
Entering edit mode
7.0 years ago
swamyvinny ▴ 20

Hi I am an undergrad who recently started working in a lab and pretty new to this so sorry if I sound like I have no idea what I'm talking about. I've been tasked with using kallisto to quantify transcript abundance from our RNAseq data(human). The reference fasta files I've been using that I found on ensembl (ftp://ftp.ensembl.org/pub/release-88/fasta/homo_sapiens/cdna/) all have multiple transcriptional variants for each gene, so kallisto then calculates the abundance of each variant gene, but my PI wants the abundance for each gene as a whole, having all the variants falling under a single gene, so I was wondering if anyone knows where I can get a human exome fasta file with a single sequence for each gene. My PI says he was able to get the abundance per gene with the old software he was using(partek genomic suite), so I feel like it should be possible. If there is another program I should use or a better method, would love to hear it. TLDR looking fasta for human exome without transcriptional variants

Thanks in advance for any help

RNA-Seq kallisto FASTA • 3.6k views
ADD COMMENT
2
Entering edit mode
7.0 years ago
h.mon 35k

Use tximport to summarize the transcript-level estimates to gene level.

ADD COMMENT
2
Entering edit mode
6.9 years ago

Hi swamyvinny,

I am not aware if and how this is possible with kallisto, but it is relatively simple with salmon (very similar tool). If you provide a .gtf or .tabular file to salmon, where you map each transcript to a gene, salmon will provide you not only counts for each transcript, but it will also summarize the counts for each gene automatically. I would not recommend using a fasta file with a single sequence for each gene, since taking only one transcript per gene will result in a big loss of information. If you wish to only measure gene abundance you can also align your samples and use a tool like featureCounts to get gene counts, but it will demand more computational resources and evidence suggest, that it will also be less accurate.

Hope this helps!

Stefan

ADD COMMENT
1
Entering edit mode
6.9 years ago

I agree with h.mon, I am using tximport after kallisto to get the summarized counts for each gene. You can use tximport output for downstream DEG analysis also. https://bioconductor.org/packages/release/bioc/html/tximport.html Thanks

ADD COMMENT

Login before adding your answer.

Traffic: 2398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6