How to preprocess the mirna seq read count?
0
3
Entering edit mode
8.0 years ago
acc.inpro321 ▴ 40

I am new to bioinformatics, and to learn more, I am starting by working on a project. I collected miRNA sequence data from the TCGA and it has a text file for each sample and the file includes:

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped

Following is the sample content of file

miRNA_ID    read_count  reads_per_million_miRNA_mapped  cross-mapped
hsa-let-7a-1    55243   9869.306676 N
hsa-let-7a-2    110572  19753.97748 Y
hsa-let-7a-3    55555   9925.046293 N
hsa-let-7b  94076   16806.92386 N
hsa-let-7c  11209   2002.517215 Y
hsa-let-7d  1843    329.256778  N
hsa-let-7e  7786    1390.989298 N
hsa-let-7f-1    166 29.656335   N
hsa-let-7f-2    66277   11840.55968 N
hsa-let-7g  4192    748.911782  N
hsa-let-7i  3617    646.186526  N
hsa-mir-1-1 0   0   N
hsa-mir-1-2 266 47.521597   N

... (trimmed)

How should I preprocess the data? I am not sure how to bring the read count to a range in between 0 and 1 for classification? Should I map the value?

value(i)=valuei−valuemin(valuemax−valuemin)
value(i)=valuei−valuemin(valuemax−valuemin)

Which one one of the columns is best suitable to be used for machine learning, reads per million or read count?

Thanks in advance.

P.S. This isn't just direct asking, I tried a bunch of things and results did not came out as expected, so the question is not effortless :p

mirna-seq preprocessing • 2.9k views
ADD COMMENT
0
Entering edit mode

What is the biological question you are trying to answer?

ADD REPLY

Login before adding your answer.

Traffic: 1577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6