gravatar for angus.gane

2 hours ago by

I have been trying to consider how current methods for studying gene interactions address multiallelic SNPs and am struggling to find explicit published discussion of the issue. I believe the joint distribution for the genotypic data following interaction between two SNPs can be expressed with the table:

            SNP2
            BB      Bb      bb
SNP1  AA    AABB    AABb    AAbb
      Aa    AaBB    AaBb    Aabb
      aa    aaBB    aaBb    aabb

To give a concrete example. Consider two biallelic SNPs rs1200 (A and G variants) and rs801 (C and G variants). The joint distribution for these SNPs is therefore:

            rs801
            CC      CG      GG
rs1200  AA  AACC    AACG    AAGG
        AG  AGCC    AGCC    AGGG
        GG  GGCC    GGCG    GGGG

Assuming we now seek to compare rs1029256 to a triallelic SNP rs1029256 with variants A, C and T. I believe the following joint distribution is required for unphased genotypes:

            rs1029256  
            AA      AC      AT      CC      CT      TT
rs1200  AA  AAAA    AAAC    AAAT    AACC    AACT    AATT
        AG  AGAA    AGAC    AGAT    AGCC    AGCT    AGTT
        GG  GGAA    GGAC    GGAT    GGCC    GGCT    GGTT

The large number of possible combinations must quickly increase the complexity of the problem and for many methods I imagine it is not possible to deal with them as for biallelic SNPs. Are these generally dropped from the analysis or re-coded so that all minor SNPs are grouped?

Thanks for any help you can provide.

link

modified 2 hours ago

written
2 hours ago
by

angus.gane0



Source link