Question

Specific sequence abundance using RNA-Seq

0

Entering edit mode

6.8 years ago

user230613 ▴ 360

Hi,

I would like to measure the expression, the abundance, of a given sequence, a specific 9-mer. I have RNA-Seq data. I know that the 9-mer is not unique in the genome, is not private of a specific gene. It is present in more than one isoform of gene A and also is present in gene B. How can I get the final number (TPM, FPKM) of the specific expression of the 9-mer?

I hope that the question is understandable:)

expression RNA-Seq • 1.4k views

ADD COMMENT • link updated 6.8 years ago by Brian Bushnell 20k • written 6.8 years ago by user230613 ▴ 360

score 3 · Answer 1 · 2017-06-27

3

Entering edit mode

6.8 years ago

Brian Bushnell 20k

To find the expression of a specific 9-mer, "ACGTACGTA", using BBMap:

kmercountexact.sh in=reads.fq out=kmers.fa k=9
bbduk.sh in=kmers.fa outm=filtered.fa k=9 mm=f literal=ACGTACGTA

"filtered.fa" should contain exactly one entry, something like:

>6721
ACGTACGTA

...though it might come out reverse-complemented. The number is the number of times it occured in the file.

ADD COMMENT • link 6.8 years ago by Brian Bushnell 20k

0

Entering edit mode

@Brian, I want to measure the expression of that sequence using RNA-Seq data, I don't want to extract the Kmer sequence from my reads.

ADD REPLY • link 6.8 years ago by user230613 ▴ 360

0

Entering edit mode

We might be miscommunicating... in my view the number resulting from this method is the expression of that sequence in the RNA-Seq data. I'm not sure that it makes much sense to translate it to FPKM, though.

ADD REPLY • link 6.8 years ago by Brian Bushnell 20k

score 0 · Answer 2 · 2017-06-27

0

Entering edit mode

6.8 years ago

Istvan Albert 100k

The closest you could get it to replace the lengths that appear in the formula above with the number of times the k-mer appears in each transcript then apply the formula as usual.

ADD COMMENT • link 6.8 years ago by Istvan Albert 100k