Hi there,

I have a dataframe of gene expression (one row per gene).
I would like to only plot certain subsets of genes (e.g. by tissue type).
I currently have these subsets defined a priori in another dataframe/list.

library(ggforce) # for paginated plotting

#a df of 'all' genes and their expression at two time points
gene_expression <- data.frame(
  gene = as.character(c('a', 'a','a','b', 'b','b','c', 'c','c','d','d','d','e','e','e','f','f','f','g', 'g', 'g','h','h','h','i', 'i','i','j','j','j',
                    'a', 'a','a','b', 'b','b','c', 'c','c','d','d','d','e','e','e','f','f','f','g', 'g', 'g','h','h','h','i', 'i','i','j','j','j')),
  timepoint = as.character(c('1', '1','1','1', '1','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1',
                    '2', '2','2', '2', '2','2','2', '2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2')),
  expression = as.numeric(sample(20:90, size = 60)))

#a dataframe of subsets (tissues) that contains subsets together with the 'genes of interest' for each subset
interesting_genes <- data.frame(
  tissue = as.character(c('heart', 'heart','heart', 'kidney','kidney', 'liver','liver','intestine', 'intestine', 'intestine')),
  gene = as.character(c('a', 'b', 'c', 'a', 'b', 'f', 'g', 'a', 'f', 'g')))

So far, I can only 'manually plot' each subset individually through subsetting prior to plotting.
However, I would prefer a loop or mapping to plot all subsets (here tissue types) one after the next.

The desired output would be 1 PDF file for each subset (i.e 1 PDF each for heart, kidney, liver, intestine) with the subset of genes relevant for the tissue type. It would also be great to have each of the PDF titles corresponding to the tissue type.

So far I have managed this below (which gives me only one subset at a time).
I though that adding another loop could work, I just don't have the coding knowledge on how to achieve that.
Any help would be appreciated.

heart_list <- as.vector(interesting_genes %>% filter(tissue == 'heart')%>% select('gene') %>% unlist()) #creates the heart subset
n_pages = 3 #numbers of pages to plot

pdf(file=paste('heart_genes.pdf'), width=10, height=7) #write to pdf
for (i in seq_len(n_pages)) {
  print(ggplot(gene_expression %>% filter(gene %in% heart_list ), aes(x=timepoint, y=expression, group=interaction(timepoint,gene ), fill = timepoint)) + # for grouping on multiple columns I use "interaction" 
      geom_boxplot() +
      geom_point(position=position_jitterdodge(),aes(group=interaction(timepoint, gene)))+
      scale_x_discrete(limits=c("1", "2")) +
      facet_wrap_paginate(~ gene, scales = "free", ncol = 2, nrow = 2, page = i)) # use facet wrap to plot pdfs of 2x2 panels, maybe there are better alternatives to facet_wrap_paginate?


Source link