Hi all,

I am a bit confused with how I should be transforming my data to use it with GSEA. I have tried two different approaches and I am not sure which one is the best.

First, I have created a DGElist, filtered my low count reads and transformed it with voom. These data I fed then to GSEA.

y <- DGEList(counts = fc_mydata$counts, genes=fc_mydata$annotation[,c("GeneID","Length")])
keep<-filterByExpr(y)
y<-y[keep,]
y<-calcNormFactors(y, method="TMM")
y_voom<-voom(y) 
GSEA_table<-y_voom$E
colnames(GSEA_table)<-pastemydata$Treatment[match(colnames(GSEA_table),mydata$Sample)],mydata$DaysToRebound,mydata$SampleID,sep="_")
 GSEA_table<-GSEA_table[,order(colnames(GSEA_table))]
 symb<-annotation$Symbol[match(rownames(GSEA_table),annotation$GeneID)]
 write.table(file="GSEA_table_final.txt",cbind("NAME"=symb,GSEA_table),row.names = F,quote = F,sep = "t")

Second approach, I ran DESeq2 on raw counts, transformed it with vsn and fed that to GSEA.

dds<-DESeqDataSetFromMatrix(countData=fc_mydata$counts, colData=mydata, ~Treatment)
dds<-DESeq(dds)
res<-results(dds)
res_ordered<-res[order(res$padj),]
res_ordered<- res_ordered[order(res_ordered$padj, decreasing = F),]
rownames(res_ordered)<-make.names(annotation$Symbol[match(rownames(res_ordered), annotation$GeneID)], unique=TRUE)
head(res_ordered)
write.table(res_ordered, "res_ordered",sep="t")

vst<-vst(dds, blind=F)
normalised_vst<-assay(vst)
rownames(normalised_vst)<-make.names(annotation$Symbol[match(rownames(normalised_vst), annotation$GeneID)], unique=TRUE)
normalised_vst<-normalised_vst[,order(colnames(normalised_vst))]
write.table(normalised_vst, "normalised_vst.txt", cbind("NAME" = rownames(normalised_vst), normalised_vst),row.names = F,quote = F,sep = "t")

What do you think, which is the most correct one? What can I do to improve?

Thanks a lot!



Source link