Biostar Beta. Not for public use.
Dividing Taxonomy Table in R
0
Entering edit mode
4.2 years ago

Hi,

I'm trying to divide a taxonomy table, which has a few hundred rows and which looks something like this:

d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)

I'm trying to make a new column for each name (e.g., "d", "p", "c" etc as column names), and have Bacteria, firmicutes, clostridia etc in their respective column, with the associated values in brackets retained,

I've tried using a variety of methods:

colsplit(taxa, split=",", names) (where names is a vector of the col names I want and taxa is the input data frame) 
split <- strsplit(as.character(taxa, ",", fixed=TRUE)

and a variety of other split methods, but it keeps returning errors like "argument "split" is missing, with no default".

Any suggestions on how I might achieve this?

Thanks for any help!

R taxonomy-table • 1.4k views
ADD COMMENTlink
0
Entering edit mode
23 months ago
Erik Wright • 360

It looks like you have a parentheses problem. Try this:

split <- strsplit(as.character(taxa), ",", fixed=TRUE)
ADD COMMENTlink
0
Entering edit mode
16 months ago
Brice Sarver ♦ 2.6k
United States

You're not really that clear at all with what you want as the end result, but this ought to get you 95% of the way there and then you can tweak it yourself.

a <- "d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)"

b <- sapply(strsplit(a, ","), "[")

colnames <- sapply(strsplit(b, ":"), "[[", 1L)

vals <- sapply(strsplit(b, ":"), "[[", 2L)

data.frame(t(vals), stringsAsFactors=FALSE)

colnames(final) <- colnames

> final (I've truncated the results because the formatting on Biostars can be tricky for generating tables this way)

               d                p                c                   o 

1 Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2)
ADD COMMENTlink
0
Entering edit mode
20 months ago
Chris S. • 290
United States

If you have a table, try using the tidyr package

x <- read.csv(text='id,taxa
1,"d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)"
2,"d:Bacteria(93.3),p:Firmicutes(60.8),c:Bacilli(59.2),o:Bacillales(59.2),f:Bacillaceae(53.3),g:Bacillus(38.4)"')

library(tidyr)
x %>% separate(taxa, c("domain", "phylum", "class", "order", "family", "genus"), ",[a-z]:")

  id           domain           phylum            class               order                family           genus
1  1 d:Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2) Lachnospiraceae(63.3) Roseburia(48.4)
2  2 d:Bacteria(93.3) Firmicutes(60.8)    Bacilli(59.2)    Bacillales(59.2)     Bacillaceae(53.3)  Bacillus(38.4)
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1