Biostar Beta. Not for public use.
Subsampling a taxa abundance matrix
Entering edit mode
5.0 years ago
bioinfo • 700


I have an abundance matrix for taxonomic composition from large numbers of shotgun metagenomes that had a sequence range from 5 million to 99 million. Here is the test raw abundance data of these taxa for 4 samples.

Sample_ID total_sequences Escherichia Pseudomona Bacillus Salmonella   Yersinia  Klesiella
sample1   13,000,000 8    13   6    13   32    0     28
sample2   60,000,000 31  25   0      0   25   19      0
sample3    5,000,000 0    0   9     51    0     0    40
sample4   99,000,000 27   19  0     0    22   32      0

I Want to subsample these raw abundance matrix data to 5 million reads and get a new subsamples-abundance matrix. I thought to subsample the first 5 million reads or randomly selected 5 million reads using Heng Li's seqtk and then run those 5 million reads for taxonomic abundance. But that's a time consing process to rerun so many metagenomes again using 5 million reads this time, so I don't want to do that. Can I just calculate a revised taxonomic abundance for 5 million reads for each sample from the matrix that I already have by using this simple calculation.

revised count = raw count/total sequences * 5,000,000

Latest taxa subsampling • 768 views

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3