Calculate FPKM value from bam files
4
0
Entering edit mode
8.7 years ago
EileenXu • 0

I have some bam files of RNAseq data, and I want to calculate the FPKM values of each gene. However, I couldn't know how the bam files are generated. I tried to use cufflinks to do this, but the corresponding values are weird. Is there any suggestions to me? I'm sorry that I'm new to this field.

RNA-Seq software error • 13k views
ADD COMMENT
0
Entering edit mode

First, why do you want to calculate FPKM if you are looking at gene level data? Can you post the command that you used for cufflinks? And why do you think the corresponding values are weird?

ADD REPLY
0
Entering edit mode

Thanks for your reply.

The command I used for cufflinks is:

cufflinks -p 8 --library-type fr-firststrand -G ./hg19.gtf $file

I think the FPKM values I got are weird because there are some transcripts got values like 1.01111e-310. Besides, I used bedcoverage to look at the number of reads at each position, and saw some transcripts with nonzero number of reads got zero FPKM value, and some transcripts with zero number of reads got nonzero FPKM value.

My question now is: If I simply want to get the expression level of each gene, could I just calculate it myself using the definition of RPKM? Is cufflinks using some specific technique?

ADD REPLY
0
Entering edit mode
8.7 years ago
seidel 11k

Are you sure you can not determine how the BAM files were generated? One of the conventions of a BAM file is that it contains the command used to generate it. The command is stored in the file header, and can be seen with:

samtools view -H yourFile.bam

One of the last lines of the header should llok something like:

@PG ID:TopHat VN:2.0.13 CL:/n/local/bin/tophat --GTF SacCer3.gtf --library-type fr-firststrand

...etc.

Occasionally some BAM files will not contain this information. You should confirm that your files do not have it.

Cufflinks should calculate FPKM values for you. But as Geek_y mentions, saying the values are "weird" is uninformative. You'll have to say exactly what you mean. You can try calculating some values by hand - you can use samtools to pull out reads covering a given gene (see samtools view), count them, and manually calculate the FPKM using the total number of mapped reads returned by samtools idxstats to see if things are making sense. You could also examine some of the read flags to see if the alignment was done using parameters you don't agree with (i.e. examine map quality, number of hits etc.).

Lastly, if you simply want to remap everything yourself, but all you have are the BAM files, you can convert them back to fastq, and map them to your liking:

bedtools bamtofastq -i input.bam -fq output.fq
ADD COMMENT
0
Entering edit mode

Thank you very much for your information.

I really can not get the information of how the bam files were generated even using the command you referred to.

As I replied to Geek_y, I think the FPKM values are weird based on two facts: 1. there are some values as low as 1.01111e-310; 2. some zero and nonzero values are conflicted with the number of reads I got through bedcoverage (I'm confident with this value after compare them with the results I got with samtools).

Do you know the technique used in the cufflinks? Could I simply calculate the RPKM value based on its definition?

Thank you again for your reply.

ADD REPLY
1
Entering edit mode

Read the cufflinks paper and weep. It's very complicated. Who said it was a good idea to use?

ADD REPLY
0
Entering edit mode
8.7 years ago

If you map to the transcriptome, you can get RPKM values (as well as raw read counts) directly from BBMap or Seal. This won't work for the genome, though, without an additional tool.

ADD COMMENT
0
Entering edit mode

dear Brian can we get FPKM directly d for paired end reads from bbmap or bbmap just give us RPKM?

ADD REPLY
0
Entering edit mode
8.7 years ago
TriS ★ 4.7k

You could try using RSEM which will also give you the TPM which it is considered to be more reliable/precise than RPKM

ADD COMMENT
0
Entering edit mode
8.7 years ago
alolex ▴ 950

I would not use cufflinks to do this. First you need to generate your count table, which you can do using HTseq-count, then take a look at this post (How To Calculate The Rpkm For The Count Tables Of Rna Seq Data) to calculate the FPKM values yourself using the .gtf file you listed above. One question though, if you don't know how the BAM file was created, then how do you know it was aligned to Hg19? It might be helpful to post the BAM file header here so others can see what your looking at.

ADD COMMENT

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6