Getting Index of normal/Diseased samples in R
1
1
Entering edit mode
6.0 years ago
David_emir ▴ 490

Hello All,

I have a data frame like as follows,

> df
  A B C D E
1 1 2 3 4 5
2 4 5 6 7 7
3 7 8 9 8 9

where rows are Genes and Column are Sample Ids, I want to index the normal/disease samples. Normal samples are A, B and disease samples are D & E (For example). I have a Phenotypic file as follows

> Pheno
      sample status
    1 A         Normal
    2 B         Normal
    3 C         Unknown
    4 D         Diseased
    5 E         Diseased

Now my question is How to index samples in 'df', say == 0 for normal and == 1 for Diseased in R based on Pheno file classification. (Indexing samples into Normal and Diseased from RNAseq raw counts file.) Hope I am clear, it would be great if you can help me in this.Thanks a lot for your help

Regards,

Have a great day,

Dave.

index r RNA-Seq • 1.1k views
ADD COMMENT
1
Entering edit mode

see if this works:

test=read.csv("test.txt",sep="\t", stringsAsFactors = F)
pheno=read.csv("pheno.txt",sep="\t", stringsAsFactors = F)

> test[,t(pheno[pheno$status=="Normal",][1])]
  A B
1 1 2
2 4 5
3 7 8
> test[,t(pheno[pheno$status=="Diseased",][1])]
  D E
1 4 5
2 7 7
3 8 9

> test
  A B C D E
1 1 2 3 4 5
2 4 5 6 7 7
3 7 8 9 8 9

> pheno
  sample   status
1      A   Normal
2      B   Normal
3      C  Unknown
4      D Diseased
5      E Diseased

Following also work for listing normal and diseased:

test[,subset(pheno$sample, pheno[,2]=="Normal")]
test[,subset(pheno$sample, pheno[,2]=="Diseased")]
ADD REPLY
1
Entering edit mode

Essentially exactly the same method as mine. Why use t()? Use double square brackets to get vector [[1]]

ADD REPLY
1
Entering edit mode
6.0 years ago
zx8754 11k

Note sure about your expected output, but here is the guess:

# example data 
df <- read.table(text = "
A B C D E
1 1 2 3 4 5
2 4 5 6 7 7
3 7 8 9 8 9", header = TRUE)
Pheno <- read.table(text = "
sample status
1 A         Normal
2 B         Normal
3 C         Unknown
4 D         Diseased
5 E         Diseased", header = TRUE)

# make logical index vectors
ix0 <- colnames(df) %in% Pheno[ Pheno$status == "Normal", "sample"]
ix1 <- colnames(df) %in% Pheno[ Pheno$status == "Diseased", "sample"]

Then use ix to subset, for example Normal:

df[, ix0]
#   A B
# 1 1 2
# 2 4 5
# 3 7 8
ADD COMMENT

Login before adding your answer.

Traffic: 2782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6