Classification using publicly available RNA-seq dataset
1
1
Entering edit mode
5.3 years ago
bioinfraML ▴ 10

Does it make sense to use publicly available RNA- Seq data from GEO to train a machine learning model to classify subjects into cases and controls and use the model to predict cases and controls in a completely different dataset? For instance let's say I have a coronary artery disease RNA- Seq data that I want to classify NASH/ noNASH can I use GEO NASH related dataset to train a Random Forest classifier and test it on my coronary artery disease RNA- Seq data?

RNA-Seq machine learning classification • 1.7k views
ADD COMMENT
1
Entering edit mode
5.3 years ago

We have successfully used RNA-seq data from one large consortium to train a classifier, which we then use to classify samples from another consortium. This worked pretty well - where we have a good idea of which class a sample should fall in, it generally does, and where it doesn't fall where we expected it to, we've leveraged that to identify novel biology.

Word of warning though - we had to reprocess the data from one of the sources to match the precise processing pipeline for data from the other source. For a large dataset this is not a trivial undertaking.

ADD COMMENT
0
Entering edit mode

Instead of reprocessing, why not use one of the reprocessed data sources such as recount (others also exist) https://jhubiostatistics.shinyapps.io/recount/

ADD REPLY
0
Entering edit mode

These generally only exist for things in GEO or public SRA of course.

ADD REPLY

Login before adding your answer.

Traffic: 2782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6