I am a PhD student who inherited some log2cpm data of expression data from bulk kidney tissue from a UUO(unilateral urethral obstruction) experiment that tests a new drug. The sample material consists of:
- 6 x Ligated Kidney (Untreated
- 6 x Ligated Kideny (Treated)
- 3 x Unligated Kidney (Untreated)
- 3 x Unligated Kidney (Treated)
The previous study showed that this drug was effective against fribrosis, so the aim of my study is to investigate how this drug affects inflammation and mitochondrial function in the diseased kidney.
So I have filtered my original dataset based on MitoCarta v2 DB, so investigate how the drug affected mitochondrial function. I afterwards run a PCA analysis on log2cpm data which revealed the following:
The following figure shows PC1(84.51%) and PC2(4.69%). Red (Ligated, Untreated), purple (Ligated, Treated), green(healthy kidney, not treated), orange (Healthy kidney, treated).
I interpret this as there is a very small difference between treated and not treated ligated kidneys, however there is a small difference.
I have just started doing bioinformatics and I am very unsure which approach I should choose further, however I have though maybe do a highly variable genes analysis (HVG) see link, isolate the 500 most variable genes and run a heatmap with hieraichal clustering. I was also thinking of looking at the loadings in PC2, and take this further. The truth is that I am very unsure what is the right approach, and my research group does not have bioinformatician I can ask anymore.
I have the following questions:
- What kind of analysis you guys would suggest I use to investigate further (Hieraichal clustering, HVG etc?)
- Is there any alternative preprocessing I could use before I run the PCA, to make the analysis more accurate?
- Are there any packages that you can suggest?
- Any other tips?
Any other tips?