Apply multidimensional models with Low number of samples? (What should i do with the train-test?)
0
0
Entering edit mode
3.8 years ago

I've been working with huge datasets since, well, i started with bioinformatics, but now i face a problem with a new dataset with very little samples.

I have 5 groups, Control, and 4 Diseases, wich frecuencies vary from a set of 10 features corresponding to the $log_2(1+2^{-Delta CT})$ values of gene expression (I had to use a pseudocount, to "nullify" my 0, preventing them to become a NA or -Inf).

Yet i only have a maximum of 20 entries per group (and a minimum of 3, because the data is full of NA's in some features). My plan is to cross use some of this features with clinical values in order to fit a model; i have a complete dataset of 87 of them with very few little Na's.

But I'm stuck with:

a) How do i divide a train-test dataset to fit my models with this very few little data?
b) How can i do the feature selection with my 8 firsts gene-features? I did some ANOVA (despise they are not normal, and the dataset its full of extreme outliers detected by 'identify_outliers()' and easily visible by boxplots; and some manovas with very few little features that are "significant" (Despise the data dont fullfill the asumptions, like normality).
c) Should i use a multinomial logistic regression? By the rule of thumb, i need about 10 samples per feature, but i dont know any more multiclass models that assign a probability.

Any recommendations?

r stats multidimensional logistic • 611 views
ADD COMMENT

Login before adding your answer.

Traffic: 2491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6