I have a data set of RNA-Seq stored in an excel file. The file contains gene expression values of two groups, control and disease. I am wondering if it is possible to, given the current data, categorize future genes into either control or disease based on their levels of gene expression and p value.

I am new to machine learning algorithms and have not delved deep into any software's or languages yet, although I'm considering learning R and/or Octave to help me with my data sets.

Source link