Hi,
Maybe a basic question. I have RNA-Seq data from a tissue taken at different time points. The reads are in the SRA database (For example, here: SRR6314256).
For a gene with many different alternatively splice variants I want to get the relative read ratios between them. Say, I want to know if the splice variant A is more expressed than the splice variant B or C.
How do I do this?
I have tried using magic-blast using an artificial transcriptome database, containing only the gene of interest, with all the splice variants; to count the number of reads in each splice variant. But, I don't know how to infer the contribution of each variant, specially when most exons are common to at least 2 variants.
Thanks!
Thanks h.mon, Do you know if Salmon or kallisto requires downloading of all the SRA reads? The total size of all the experiment is 100Gb. I used magicblast because it can work with SRA accession numbers without downloading the complete set.
thanks
Meaning you used remote magicBLAST (sorry, I never used magicBLAST)?
There are tricks to stream the data from SRA (or ENA) directly into kallisto / Salmon, which means you don't have to save the data on disk, but they have to be transferred to the local machine nonetheless. The tricks:
http://www.nxn.se/valent/streaming-rna-seq-data-from-ena
https://standage.github.io/streaming-data-from-the-sra-with-fastq-dump.html
http://genomespot.blogspot.com.br/2015/01/sra-toolkit-tips-and-workarounds.html
fastq-dump
allows streaming, so you may be able to feed Salmon / kallisto directly with it.https://github.com/ncbi/sra-tools/issues/57
Some packages to facilitate SRA data streaming (I never tested them):
https://github.com/jdidion/ngstream