Question

Zero Counts for many genes in 1 replicate, why?

0

Entering edit mode

8.8 years ago

dec986 ▴ 370

Hello,

I ran cufflinks on 18 different replicates like this:

time cufflinks -p 8 accepted_hits.bam -g ../mm10genes.gtf

and then transcript counting with bedtools.

Each column is "transcript gene " and counts for "3 female control, 3 female condition 1, 3 female condition 2, and 3 male control, 3 male condition 1, 3 male condition 2"

and I am at a loss why I have

con@ubuntu:~/RSAB-BPA/basic/Fastq/FGC1037_s_5_GTGAAA$ grep Xist ../gene.count
NR_001463    Xist    21993    11604    6711    14790    6150    0    5450    2974    43627    9    52    1    66    5    52    18    247    32
NR_001570    Xist    13192    6975    4059    8543    3464    0    3289    1773    26716    8    30    1    35    2    35    10    152    16
con@ubuntu:~/RSAB-BPA/basic/Fastq/FGC1037_s_5_GTGAAA$ grep Gapdh ../gene.count
NM_001289726    Gapdh    50279    4550    3184    22663    2130    0    7702    4332    37018    6114    4938    4493    5059    9287    3420    6147    9016    8596
NM_008084    Gapdh    50290    4551    3184    22662    2129    0    7703    4333    37030    6115    4937    4495    5059
con@ubuntu:~/RSAB-BPA/basic/Fastq/FGC1037_s_5_GTGAAA$ grep Actb ../gene.count
NM_007393    Actb    202625    20199    12112    76439    8610    0    49824    26691    133110    62113    17415    16526    23621    36352    17794    27867    36203    43491

but the gene expression looks fine in a genome browser. I don't understand what I could be doing wrong or what I should be looking for.

I have two questions:

Why do so many genes show 0 expression?
Why do some replicates show consistently higher expression of certain genes?

-DEC

cufflinks RNA-Seq • 2.7k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by dec986 ▴ 370

0

Entering edit mode

Given that the same sample has 0 counts for all of the genes you showed, have you seen if it has problems? My guess is that this samples clusters far away from everything else and should probably just get excluded.

ADD REPLY • link 8.8 years ago by Devon Ryan 104k

0

Entering edit mode

As Devon suggested it is always better to perform some clustering for studies that involve tens or hundreds of samples to identify outliers before any further analysis. The discrepancy in read counts could be attributed to one of the several factors 1) difference in sequencing depth 2) difference in the complexity of RNA-seq libraries. For your housekeeping genes counts only one sample gives zero counts. Although there is a considerable variation across different samples for house keeping genes but it could be purely due to difference in sequencing depth. I would try normalizing these samples using TMM method and see if it helps reducing the variation in expression for the housekeeping genes.

ADD REPLY • link 8.8 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2015-06-29

0

Entering edit mode

8.8 years ago

karl.stamm 4.1k

Xist is only used in females.

Edit: I just read that you used cufflinks and then "transcript counting with bedtools". That's going to be a poor match to the overall expression. You've got more reads total in the first sample. You should extract the gene expression in some other format, cufflinks or cuffnorm can calculate RPKM, or other tools can calculate CPM or TPM.

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by karl.stamm 4.1k

0

Entering edit mode

Hello karl.stamm

I know that Xist is only used in females. This is a female sample. Also housekeeping genes Gapdh and Actb have the same problem.

-Dave

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by dec986 ▴ 370

0

Entering edit mode

I didn't look hard at the data. Please post the column definitions / column header so these numbers make sense.

ADD REPLY • link 8.8 years ago by karl.stamm 4.1k