Most of your questions revolve around an entire scientific field of variant interpretation. Using a monolithic term such as "mutated gene" belies the inherent complexity of variant interpretation. The impact of a specific variant on protein function lies on a spectrum. You might want to read some of the saturation mutagenesis papers where they functionally assess all possible variants within a protein (e.g. PMIDS: 30209399 and 29706350). That being said, loss-of-function variants across a protein are generally considerably more common than gain-of-function variants.

Q: It is common to see the variation of association beta among the SNPs within the same gene?

A: I assume you are talking about protein-coding variants. While most pLOF variants may have a similar large impact on protein function (except caveats related to being near the c-terminus, etc.), missense variants can have profoundly different impacts ranging from virtually no impact to highly pathogenic. If the missense variant is truly causal, then the beta should reflect a combination of two factors: the degree of functional impact on the protein and the relevance of the protein for the disease. I've previously noted for somatic mutations in cancer that the frequency of particular oncogenic variants in PTEN is directly related to the extent those variants reduce phosphatase activity (i.e. functional impact, PMID: 31202631, Fig 5). Thus, I would find it more odd if all the variants had the same beta.

Q: Is it common to see the p-value of these SNPs also varies a lot?

A: P-values are impacted by allele frequency of the SNP that you are looking at, which in turn could reflect chance historical/demographic effects. Imagine you study SNP A in population X with an allele frequency of 0.02, you may get a very low p-value, e.g. p=1e-20. But a second population Y may only have an allele frequency of 0.00001, which could result in the same SNP A being not significant. Differences in allele frequencies between populations is, in part, why people do GWAS, etc. in diverse populations.

Q: Is there a way to objectively (and legitimately) filter the SNPs to be included in the analysis?

A: Yes, the most common way is to use machine learning (ML) models to trained to predict either "pathogenic" or "functionally damaging" variants. Conceptually, one could filter by thresholding the score from such ML methods, or use the score directly as a continuous variable that weighs the likelihood of functional impact. If you are concerned about "objectivity", then you should look at benchmarking studies for these methods and use appropriately (e.g. Cancer: PMID 32079540).



Source link