I have been trying to consider how current methods for studying gene interactions address multiallelic SNPs and am struggling to find explicit published discussion of the issue. I believe the joint distribution for the genotypic data following interaction between two SNPs can be expressed with the table:
SNP2 BB Bb bb SNP1 AA AABB AABb AAbb Aa AaBB AaBb Aabb aa aaBB aaBb aabb
To give a concrete example. Consider two biallelic SNPs rs1200 (A and G variants) and rs801 (C and G variants). The joint distribution for these SNPs is therefore:
rs801 CC CG GG rs1200 AA AACC AACG AAGG AG AGCC AGCC AGGG GG GGCC GGCG GGGG
Assuming we now seek to compare rs1029256 to a triallelic SNP rs1029256 with variants A, C and T. I believe the following joint distribution is required for unphased genotypes:
rs1029256 AA AC AT CC CT TT rs1200 AA AAAA AAAC AAAT AACC AACT AATT AG AGAA AGAC AGAT AGCC AGCT AGTT GG GGAA GGAC GGAT GGCC GGCT GGTT
The large number of possible combinations must quickly increase the complexity of the problem and for many methods I imagine it is not possible to deal with them as for biallelic SNPs. Are these generally dropped from the analysis or re-coded so that all minor SNPs are grouped?
Thanks for any help you can provide.