How to get the gene name with the array as headings in numpy-python?
1
1
Entering edit mode
5.5 years ago
S AR ▴ 80

I have a table like below:

 Gene name  4h  12h 24h 48h
    A2M         0.12    0.08    0.06    0.02
    FOS         0.01    0.07    0.11    0.09
    BRCA2   0.03    0.04    0.04    0.02
    CPOX            0.05    0.09    0.11    0.14

I made its array like this:

import numpy as np
genelst = np.array(["A2M", "FOS", "BRCA2","CPOX"])
a2m =np.array([[0.12,0.08,0.06,0.02]])
fos = np.array([[0.01,0.07,0.11,0.09]])
brca2 = np.array([[0.03,0.04,0.04,0.02]])
cpox = np.array([[0.05,0.09,0.11,0.14]])
comb_array = np.vstack([genelst, a2m,fos,brca2,cpox])

now i want to grab that which gene has the maximum mean expression value and sort the gene names from high to low expression values?

i did:

mean_a2m = np.mean(a2m)
mean_fos = np.mean(fos)
mean_brca2 = np.mean(brca2)
mean_cpox = np.mean(cpox)
mean_expression_gene = np.vstack([[mean_a2m,mean_fos,mean_brca2,mean_cpox]])
mean_expression_gene_array = np.vstack([[genelst], [mean_a2m,mean_fos,mean_brca2,mean_cpox]])
print ("The mean expression value for A2M is:" + str(mean_a2m))
print ("The mean expression value for FOS is:" + str(mean_fos))
print ("The mean expression value for BRCA2 is:" + str(mean_brca2))
print ("The mean expression value for CPOX is:" + str(mean_cpox))
mean_expression_gene.max()
mean_expression_gene.sort()

This is just giving me the 0.0975 max value not gene name? and how to make the array understand that the gene names are the header so that it don't count its values for which i have to separate the gene names while using .max function.

Secondly, instead of using 1d array of each ( a2m, fos, brca2, cpox) for calculating average is there a way that i can get the average value of a row or a col of a 2d array in this case for comb_array?

numpy python array • 2.3k views
ADD COMMENT
1
Entering edit mode

Don't deal with multiple arrays, check how to build a dataframe

ADD REPLY
3
Entering edit mode
5.5 years ago
#import pandas
import pandas as pd
#Create your data
d = {'4h': [0.12,0.01,0.03,0.05], '12h': [0.08,0.07,0.04,0.09], '24h': [0.06,0.11,0.04,0.11], '48h':[0.02,0.09,0.02,0.14]}
#Generate a dataframe with your data and the index accordingly
df = pd.DataFrame(data=d, index=['A2M', 'FOS', 'BRCA2', 'CPOX'])

#df
#12h   24h   48h    4h
#A2M    0.08  0.06  0.02  0.12
#FOS    0.07  0.11  0.09  0.01
#BRCA2  0.04  0.04  0.02  0.03
#CPOX   0.09  0.11  0.14  0.05

#Create a new 'mean' column
df['mean'] = df.mean(axis=1)
#Sort your dataframe on this new column, with decreasing mean value (ascending=False)
df = df.sort_values(["mean"], ascending=False)

#df
#12h   24h   48h    4h    mean
#CPOX   0.09  0.11  0.14  0.05  0.0975
#A2M    0.08  0.06  0.02  0.12  0.0700
#FOS    0.07  0.11  0.09  0.01  0.0700
#BRCA2  0.04  0.04  0.02  0.03  0.0325

#Read all df lines
for index, row in df.iterrows():
    print("The mean expression value for "+index+" is: "+str(row['mean']))

#The mean expression value for CPOX is: 0.0975
#The mean expression value for A2M is: 0.07
#The mean expression value for FOS is: 0.07
#The mean expression value for BRCA2 is: 0.0325
ADD COMMENT
0
Entering edit mode

wow... That's great. But can i do it without panda just using numpy. As it is my assignment and i can use panda right now can you give solution within numpy?

ADD REPLY
0
Entering edit mode

You should have put the fact that this is an assignment in your initial post

ADD REPLY
0
Entering edit mode

You can also create a dictionnary of genes (as key), where each key contains a numpy array

ADD REPLY
0
Entering edit mode
df['mean_exp_per_time'] = df.mean(axis=0)
df['mean_exp_per_gene'] = df.mean(axis=1)
df

when im calculating col mean as well after rows mean or vice versa it is giving me :

    4h  12h 24h 48h mean_exp_per_interval   mean_exp_per_gene
A2M 0.12    0.08    0.06    0.02    NaN 0.0700
FOS 0.01    0.07    0.11    0.09    NaN 0.0700
BRCA2   0.03    0.04    0.04    0.02    NaN 0.0325
CPOX    0.05    0.09    0.11    0.14    NaN 0.0975


4h  12h 24h 48h mean_exp_per_gene   mean_exp_per_time
A2M 0.12    0.08    0.06    0.02    0.0700  NaN
FOS 0.01    0.07    0.11    0.09    0.0700  NaN
BRCA2   0.03    0.04    0.04    0.02    0.0325  NaN
CPOX    0.05    0.09    0.11    0.14    0.0975  NaN
ADD REPLY
0
Entering edit mode

Secondly, if i want to find which gene is showing maximum expression mean using .max() it s just showing the value not the gene name.

ADD REPLY
0
Entering edit mode

Your gene name is contains in your variable name, which can not be print. I don't know if you can add an index to your numpy array, maybe... But it is not the best solution

ADD REPLY
0
Entering edit mode

And if i use loop for col to get the mean:

for index, col in df.columns():
    print("The mean expression value for "+index+" is: "+str(col['mean_exp_per_time']))

It is giving the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-109-8ac821bb44df> in <module>()
----> 1 for index, col in df.columns():
      2     print("The mean expression value for "+index+" is: "+str(col['mean_exp_per_time']))

TypeError: 'Index' object is not callable
ADD REPLY

Login before adding your answer.

Traffic: 2263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6