How to plot dendrogram based on sample names
1
0
Entering edit mode
4.9 years ago
Bioinfonext ▴ 460

Hi,

I do have FPKM count and interested in dendrogram for samples cluster.

I used below code but it generate dendogram based on geneID instead of sampleID.

  > countMatrix = read.table("Trinity_trans.counts.matrix.txt",header=T,sep='\t',check.names=F,row.names=1)

> dim(countMatrix)

[1] 142686      6

> head(countMatrix)

                          AS_0DAP AS_4DAP AS_8DAP NMK_0DAP NMK_4DAP NMK_8DAP

TRINITY_DN17944_c0_g1_i11   14.32   24.63    8.21     4.54       20     8.49

TRINITY_DN7591_c0_g1_i1      0.00    0.00    1.00     3.00        3     0.00

TRINITY_DN28918_c0_g1_i1     1.00    2.00    1.00     0.00        2     0.00

TRINITY_DN14082_c2_g2_i5     6.00    5.00    1.00     0.00        1     0.00

TRINITY_DN31994_c0_g1_i1     1.00    2.00    0.00     0.00        0     3.00

TRINITY_DN19560_c0_g1_i1     1.00    3.00    0.00     0.00        1     1.00

> rv <- rowVars(countMatrix)

> summary(rv)

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 

0.000e+00 1.000e+00 1.500e+01 3.570e+05 5.180e+02 4.122e+09 


> (q75 <-quantile(rowVars(countMatrix), .75))
     75% 

518.3202 

> m2 <- countMatrix[rv >q75, ]

> dim(m2)

[1] 35672     6

> summary(rowVars(m2))

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 

5.180e+02 1.670e+03 6.677e+03 1.428e+06 4.101e+04 4.122e+09 

> d <- dist(m2, method="euclidean")

> h <-hclust(d, method="complete")

> plot(h)

I will be thankful for your time and help.

Regards

nabiyogesh

R bioconducter • 1.2k views
ADD COMMENT
1
Entering edit mode
4.9 years ago

Hey, you may simply have to transpose your data at some point. This can be done via the t() function in R.

ADD COMMENT
0
Entering edit mode

thank a lot, could you please also suggest me what this value means here: 518.3202

 (q75 <-quantile(rowVars(countMatrix), .75))
     75% 

518.3202
ADD REPLY
0
Entering edit mode

May I ask from where you found that line of code?

ADD REPLY
0
Entering edit mode

I used genefilter to use only most variable:

https://support.bioconductor.org/p/61678/

ADD REPLY
0
Entering edit mode

Oh, I see, it is just filtering out genes based on a cut-off of variance. Genes of low variance add 'no' information to the differential expression test, so, many filter these out. To put it another way: with that code, you are only retaining the genes that are in the upper quartile of the variance range.

ADD REPLY
0
Entering edit mode

Thanks for your all help and time.

ADD REPLY

Login before adding your answer.

Traffic: 3067 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6