Hi,
I have some difficulties splitting my taxonomy column into different rank, i.e."domain", "phylum", "class", "order", "family", "genus" .
The biggest problem is that the format in my taxonomy column is not uniform. Some of them have complete taxonomy levels, while others only have “domain”、“phylum”、“genus”levels.
My data has a few thousand rows and which looks something like this :
OTUID Taxonomy
OTU1 d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:Pseudomonadales,f:Pseudomonadaceae,g:Pseudomonas
OTU20 d:Archaea,p:"Thaumarchaeota",o:Nitrososphaerales,f:Nitrososphaeraceae,g:Nitrososphaera
OTU774 d:Bacteria,p:"Armatimonadetes",g:Armatimonadetes_gp4
I'm not familiar with R, so I've been searching for relevant solutions on the Internet a whole day, and I've tried the separate function in the tidyr package, like this
library(tidyr)
x <- read.csv("annotation.csv")
y <- x %>% separate(Taxonomy, c("domain", "phylum", "class", "order", "family", "genus"), ",[a-z]:")
write.csv(y,"tax_split.csv",row.names = TRUE)
But the result let me down. This can't split my taxonomy according to different ranks.
OTUID domain phylum class order family genus
OTU1 d:Bacteria "Proteobacteria" Gammaproteobacteria Pseudomonadales Pseudomonadaceae Pseudomonas
OTU20 d:Archaea "Thaumarchaeota" Nitrososphaerales Nitrososphaeraceae Nitrososphaera NA
OTU774 d:Bacteria "Armatimonadetes" Armatimonadetes_gp4 NA NA NA
Finally, I have to use the excel filtering function to deal with this, but this method is very time-consuming(╯︵╰)
I still want to ask, is there any elegant way to use R to solve this problem?
Thanks for your help!