Question

Getting Index of normal/Diseased samples in R

1

Entering edit mode

6.0 years ago

David_emir ▴ 490

Hello All,

I have a data frame like as follows,

> df
  A B C D E
1 1 2 3 4 5
2 4 5 6 7 7
3 7 8 9 8 9

where rows are Genes and Column are Sample Ids, I want to index the normal/disease samples. Normal samples are A, B and disease samples are D & E (For example). I have a Phenotypic file as follows

> Pheno
      sample status
    1 A         Normal
    2 B         Normal
    3 C         Unknown
    4 D         Diseased
    5 E         Diseased

Now my question is How to index samples in 'df', say == 0 for normal and == 1 for Diseased in R based on Pheno file classification. (Indexing samples into Normal and Diseased from RNAseq raw counts file.) Hope I am clear, it would be great if you can help me in this.Thanks a lot for your help

Regards,

Have a great day,

Dave.

index r RNA-Seq • 1.1k views

ADD COMMENT • link updated 6.0 years ago by zx8754 11k • written 6.0 years ago by David_emir ▴ 490

1

Entering edit mode

see if this works:

test=read.csv("test.txt",sep="\t", stringsAsFactors = F)
pheno=read.csv("pheno.txt",sep="\t", stringsAsFactors = F)

> test[,t(pheno[pheno$status=="Normal",][1])]
  A B
1 1 2
2 4 5
3 7 8
> test[,t(pheno[pheno$status=="Diseased",][1])]
  D E
1 4 5
2 7 7
3 8 9

> test
  A B C D E
1 1 2 3 4 5
2 4 5 6 7 7
3 7 8 9 8 9

> pheno
  sample   status
1      A   Normal
2      B   Normal
3      C  Unknown
4      D Diseased
5      E Diseased

Following also work for listing normal and diseased:

test[,subset(pheno$sample, pheno[,2]=="Normal")]
test[,subset(pheno$sample, pheno[,2]=="Diseased")]

ADD REPLY • link 6.0 years ago by cpad0112 21k

1

Entering edit mode

Essentially exactly the same method as mine. Why use t()? Use double square brackets to get vector [[1]]

ADD REPLY • link 6.0 years ago by zx8754 11k

score 1 · Answer 1 · 2018-05-01

Note sure about your expected output, but here is the guess:

# example data 
df <- read.table(text = "
A B C D E
1 1 2 3 4 5
2 4 5 6 7 7
3 7 8 9 8 9", header = TRUE)
Pheno <- read.table(text = "
sample status
1 A         Normal
2 B         Normal
3 C         Unknown
4 D         Diseased
5 E         Diseased", header = TRUE)

# make logical index vectors
ix0 <- colnames(df) %in% Pheno[ Pheno$status == "Normal", "sample"]
ix1 <- colnames(df) %in% Pheno[ Pheno$status == "Diseased", "sample"]

Then use ix to subset, for example Normal:

df[, ix0]
#   A B
# 1 1 2
# 2 4 5
# 3 7 8