I have calculated the ssGSEA scores for a dataset of 34 samples for the EMT gene signature using gene pattern. I first converted the featurecount row read counts to RPKM using this perlscript:
"perl rpkm_script_beta.pl ~/organoidscounts.txt 7:40 6 > RPKM_organoids
where 7:40 is the positions of columns with row data of the 34 samples and 6 is the "Length" column in the featurecount output table (assuming this is the gene length information).
I then used the RPKM table and converted to .GCT file and downloaded the hallmark EMT gene signature as a .GMT file. I used both files as inputs for the ssGSEA module in Genepattern which gave me a .GCT file with ssGSEA score for each sample.
Does this sound okay?, I wish I could attach screen shoots of the files but I do not know how to do that. Its just my first time to do this kind of analyses and I am little worried.
Also how could I interpret the ssGSEA data, does it mean that samples with high scores are more mesenchymal? Is there a cutoff value I could use to stratify samples into Epithelial and mesenchymal based on these scores?
Does ssGSEA analysis take into account expression directionality of genes in the signature?
Thank you for your time and help.