Subsampling a taxa abundance matrix
0
0
Entering edit mode
8.2 years ago
bioinfo ▴ 830

Hi

I have an abundance matrix for taxonomic composition from large numbers of shotgun metagenomes that had a sequence range from 5 million to 99 million. Here is the test raw abundance data of these taxa for 4 samples.

Sample_ID total_sequences Escherichia Pseudomona Bacillus Salmonella   Yersinia  Klesiella
sample1   13,000,000 8    13   6    13   32    0     28
sample2   60,000,000 31  25   0      0   25   19      0
sample3    5,000,000 0    0   9     51    0     0    40
sample4   99,000,000 27   19  0     0    22   32      0

I Want to subsample these raw abundance matrix data to 5 million reads and get a new subsamples-abundance matrix. I thought to subsample the first 5 million reads or randomly selected 5 million reads using Heng Li's seqtk and then run those 5 million reads for taxonomic abundance. But that's a time consing process to rerun so many metagenomes again using 5 million reads this time, so I don't want to do that. Can I just calculate a revised taxonomic abundance for 5 million reads for each sample from the matrix that I already have by using this simple calculation.

revised count = raw count/total sequences * 5,000,000
subsampling taxa • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6