Dividing Taxonomy Table in R
3
0
Entering edit mode
7.8 years ago

Hi,

I'm trying to divide a taxonomy table, which has a few hundred rows and which looks something like this:

d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)

I'm trying to make a new column for each name (e.g., "d", "p", "c" etc as column names), and have Bacteria, firmicutes, clostridia etc in their respective column, with the associated values in brackets retained,

I've tried using a variety of methods:

colsplit(taxa, split=",", names) (where names is a vector of the col names I want and taxa is the input data frame) 
split <- strsplit(as.character(taxa, ",", fixed=TRUE)

and a variety of other split methods, but it keeps returning errors like "argument "split" is missing, with no default".

Any suggestions on how I might achieve this?

Thanks for any help!

R taxonomy-table • 3.9k views
ADD COMMENT
0
Entering edit mode
7.8 years ago
Erik Wright ▴ 420

It looks like you have a parentheses problem. Try this:

split <- strsplit(as.character(taxa), ",", fixed=TRUE)
ADD COMMENT
0
Entering edit mode
7.8 years ago
Brice Sarver ★ 3.8k

You're not really that clear at all with what you want as the end result, but this ought to get you 95% of the way there and then you can tweak it yourself.

a <- "d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)"

b <- sapply(strsplit(a, ","), "[")

colnames <- sapply(strsplit(b, ":"), "[[", 1L)

vals <- sapply(strsplit(b, ":"), "[[", 2L)

data.frame(t(vals), stringsAsFactors=FALSE)

colnames(final) <- colnames

> final (I've truncated the results because the formatting on Biostars can be tricky for generating tables this way)

               d                p                c                   o 

1 Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2)
ADD COMMENT
0
Entering edit mode
7.8 years ago
Chris S. ▴ 320

If you have a table, try using the tidyr package

x <- read.csv(text='id,taxa
1,"d:Bacteria(96.3),p:Firmicutes(70.8),c:Clostridia(69.2),o:Clostridiales(69.2),f:Lachnospiraceae(63.3),g:Roseburia(48.4)"
2,"d:Bacteria(93.3),p:Firmicutes(60.8),c:Bacilli(59.2),o:Bacillales(59.2),f:Bacillaceae(53.3),g:Bacillus(38.4)"')

library(tidyr)
x %>% separate(taxa, c("domain", "phylum", "class", "order", "family", "genus"), ",[a-z]:")

  id           domain           phylum            class               order                family           genus
1  1 d:Bacteria(96.3) Firmicutes(70.8) Clostridia(69.2) Clostridiales(69.2) Lachnospiraceae(63.3) Roseburia(48.4)
2  2 d:Bacteria(93.3) Firmicutes(60.8)    Bacilli(59.2)    Bacillales(59.2)     Bacillaceae(53.3)  Bacillus(38.4)
ADD COMMENT

Login before adding your answer.

Traffic: 2433 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6