extracting several columns from a file
0
0
Entering edit mode
7.1 years ago
zizigolu ★ 4.3k

Hi,

I have a file like below

> head(data[,1:4])
         ID GSM943243 GSM943244 GSM943245
1    EEF1A1 14.517466 14.591990 14.582881
2     GAPDH 11.925736 11.820686 11.719080
3 LOC643334  7.505173  7.494044  7.365844
4   SLC35E2  7.720945  7.642623  7.727642
5    DUSP22  7.348523  7.345953  7.385760
6 LOC642820  7.538024  7.582380  7.501941
> # watching the dimension of matrix
> dim(data)
[1] 25217   203
>

I have a list of accession numbers (for example GSM943243) corresponding control samples, how I can extract controls from my file???

R • 1.4k views
ADD COMMENT
0
Entering edit mode

thank you, inspired by the second link I did like so

list of accessions=t(list of accessions)
>extraction=data[,c(list of accessions)]

but results is without row.names,  I added manually.
ADD REPLY
1
Entering edit mode

first of all you need to read.table it into an object with rownames=1 then that dataframe can be subsetted with the columns of your choice. Do you know the indexes of the columns that are controls? then it is faily simple. In that case your rownames will be intact. Doing manually rownames is not a good approach. When you are doing programmtically add everything the same way.

ADD REPLY
0
Entering edit mode

in my another file

> head(data1[,1:4])
          GSE35974_Biomat_17___BioAssayImplId.212266Name.DE37_111809_rep5
LINC01128                                                        8.789351
SAMD11                                                           7.059227
KLHL17                                                           7.453778
PLEKHN1                                                          7.546892
ISG15                                                            7.091302
AGRN                                                             7.505454
          GSE35974_Biomat_139___BioAssayImplId.212294Name.AC99_111809
LINC01128                                                    8.733914
SAMD11                                                       7.120576
KLHL17                                                       7.503455
PLEKHN1                                                      7.425533
ISG15                                                        6.788893
AGRN                                                         7.269030
          GSE35974_Biomat_137___BioAssayImplId.212295Name.AC96_111809
LINC01128                                                    8.914045
SAMD11                                                       7.232991
KLHL17                                                       7.472246
PLEKHN1                                                      7.352260
ISG15                                                        7.017254
AGRN                                                         7.436749
          GSE35974_Biomat_136___BioAssayImplId.212296Name.AC95_111809
LINC01128                                                    8.977482
SAMD11                                                       7.087887
KLHL17                                                       7.436269
PLEKHN1                                                      7.321820
ISG15                                                        6.922916
AGRN                                                         7.470006

and a list of samples names in another files

> head(data2[,1])

[1] GSE35974_Biomat_69___BioAssayImplId=212347Name=AC45_111109
[2] GSE35974_Biomat_67___BioAssayImplId=212345Name=AC47_111109
[3] GSE35974_Biomat_64___BioAssayImplId=212344Name=AC49_102309
[4] GSE35974_Biomat_74___BioAssayImplId=212351Name=AC41_111809
[5] GSE35974_Biomat_73___BioAssayImplId=212349Name=AC43_102309
[6] GSE35974_Biomat_68___BioAssayImplId=212348Name=AC44_111809
94 Levels: GSE35974_Biomat_10___BioAssayImplId=212273Name=DE35_102309 ...

>

I want to extract only these samples from my expression data file

I did like so

list of samples=t(data2)
 >extraction=data1[,c(data2)]

but telling

Error in [.data.frame(data1, , c(data2)) : undefined columns selected

I could not figure out what happened that not working as previous case

ADD REPLY
0
Entering edit mode

to be honest this is a nominal thing to do both in swk,sed or in R. The thing which i see here is the header name of data1 is not same as that in data2. You have to give some example of columns which you show in data[,1:4] , in your data2 as well. I do not see that. Also the names are amazingly long strings which lot of attributes that makes me think they are actually not same. Lets have you have 10 columns in a dataframe x and you want to subset only 6 which are columns you know you want to extract. lets say you know the column indexes. So they are 2,4, 7 through 10.

x<-read.table("file.txt",header=T,row.names=1)
x.1<-x[,c(2,4,7:10)]
ADD REPLY
1
Entering edit mode

I would like to suggest editing the question title to include that you are trying to do this in R. For example: "Extracting several columns from a file using R." This will help other people looking for similar answers via search engines. As stated, the answer could be as simple as man cut.

ADD REPLY

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6