Im am attempting to understand which genes are contributing the most to PC2. As you can see from the PCA plot from the DESeq2 plotPCA() function below the triangle samples appear to be seperated on PC2. (They are all the same disease)

enter image description here

My main questions is what do I need to start working with PCATools pca() function?

Is using the rlog data from DESeq2 approriate as below?

dds.sm <- DESeqDataSet(gse, design = ~ batch + diagnosis)

dds.sm <- estimateSizeFactors(dds.sm)

rld.sm <- rlog(dds.sm, blind = FALSE)

rld.sm.output <- assay(rld.sm)

pca.project <- pca(rld.sm.output)

    rangeRetain = 0.01,
    labSize = 3.0,
    shapeSizeRange = c(3, 3),
    title="Loadings plot",
    subtitle="PC1, PC2",
    caption = 'Top 1% variables',
    shape = 24,
    col = c('limegreen', 'black', 'red3'),
    drawConnectors = TRUE)

And how can I add the gene symbols?

enter image description here

Thanks for any help!


Source link