gravatar for Sebastian Hesse

2 hours ago by

Germany / Munich / Dr. von Hauner Children's Hospital

In a dataset of protein expression values (expression data as log2) we found 3 clusters using k-means.
Now we would like to first perform an ANOVA and next a post-test (eg Tukeys) to test each pair of cluster grouping for differentially expressed proteins.

Unfortunately, no resources I found discussed this rather simple case but only much more complicated cases with multiple group combinations (and eg two treatments).

With the HybridMTest package, the ANOVA went fine and I have now the FDR for differentially expressed proteins between the 3 groups (6 samples per group). But now I'm stuck on how to calculate the posttest for every protein (= rows, n = 3878) between all of the 3 groups.

I could not find an appropriate package or function, maybe one of you could offer a hint on how to solve this? I would like to obtain as a result a df with the protein_id, the comparison group, the FDR and the logFoldChange.

Many thanks! (and sorry for providing the data only in a very descriptive way. I dont know how to create example expression data but will look into it)


Data description:

expression_df: rownames(expression_df) = protein_id, colnames(expression_df) = sample_id

pheno_df: rownames(pheno_df) = sample_id, pheno_df$cluster = cluster group (1, 2, or 3)

anova_results: rownames(protein_id),  anova_results$comparison = eg "cluster 1vs 2",  anova_results$FDR = FDR controlled ANOVA result, anova_results$logFC = logFC

Source link