Question

Dividing Taxonomy Table in R

0

Entering edit mode

7.8 years ago

fionnuala.mm • 0

Hi,

I'm trying to divide a taxonomy table, which has a few hundred rows and which looks something like this:

d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)

I'm trying to make a new column for each name (e.g., "d", "p", "c" etc as column names), and have Bacteria, firmicutes, clostridia etc in their respective column, with the associated values in brackets retained,

I've tried using a variety of methods:

colsplit(taxa, split=",", names) (where names is a vector of the col names I want and taxa is the input data frame) 
split <- strsplit(as.character(taxa, ",", fixed=TRUE)

and a variety of other split methods, but it keeps returning errors like "argument "split" is missing, with no default".

Any suggestions on how I might achieve this?

Thanks for any help!

R taxonomy-table • 3.9k views

ADD COMMENT • link updated 7.8 years ago by Chris S. ▴ 320 • written 7.8 years ago by fionnuala.mm • 0

score 0 · Answer 1 · 2016-07-11

0

Entering edit mode

7.8 years ago

Erik Wright ▴ 420

It looks like you have a parentheses problem. Try this:

split <- strsplit(as.character(taxa), ",", fixed=TRUE)

ADD COMMENT • link 7.8 years ago by Erik Wright ▴ 420

score 0 · Answer 2 · 2016-07-11

You're not really that clear at all with what you want as the end result, but this ought to get you 95% of the way there and then you can tweak it yourself.

a <- "d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)"

b <- sapply(strsplit(a, ","), "[")

colnames <- sapply(strsplit(b, ":"), "[[", 1L)

vals <- sapply(strsplit(b, ":"), "[[", 2L)

data.frame(t(vals), stringsAsFactors=FALSE)

colnames(final) <- colnames

> final (I've truncated the results because the formatting on Biostars can be tricky for generating tables this way)

               d                p                c                   o 

1 Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2)

score 0 · Answer 3 · 2016-07-11

If you have a table, try using the tidyr package

x <- read.csv(text='id,taxa
1,"d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)"
2,"d:Bacteria(93.3),p:Firmicutes(60.8),c:Bacilli(59.2),o:Bacillales(59.2),f:Bacillaceae(53.3),g:Bacillus(38.4)"')

library(tidyr)
x %>% separate(taxa, c("domain", "phylum", "class", "order", "family", "genus"), ",[a-z]:")

  id           domain           phylum            class               order                family           genus
1  1 d:Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2) Lachnospiraceae(63.3) Roseburia(48.4)
2  2 d:Bacteria(93.3) Firmicutes(60.8)    Bacilli(59.2)    Bacillales(59.2)     Bacillaceae(53.3)  Bacillus(38.4)