I have raw SNP data for 350 postmenopausal women consisting of 57 genes and their corresponding SNPs.
I want to apply SVM for classification.
I have values in the form of AA, CT, GA in the respective 57 columns. As we know the major allele and minor allele is different for each gene based on the allele frequency, how do I encode the bi-allelic SNPs for all the genes? I am very confused.
How do I encode the entire data in a consistent manner?
Please help me!!!