We're analyzing RNAseq data with a pipeline consisting of Salmon, tximeta, and DESeq2.
We have a multi-factorial experimental design, and the experiment was performed on cell lines.
On thing that surprised us is that in the result output, we observe many gene polymorphisms.
For example, for gene NLRP2 we observed multiple entries associated with unique ensembl IDs ENSG00000022556, ENSG00000275082, ENSG00000275843, etc.
baseMean log2FoldChange pvalue padj gene CTRL_1 CTRL_2 A_1 A_2 B_1 B_2 A+B_1 A+B_2
ENSG00000022556 559.2711127 -1.709470173 5.51E-09 2.16E-07 NLRP2 33.063154 17.498608 23.790824 28.562371 6.421092 6.755627 29.858583 23.977158
ENSG00000275082 349.6580809 2.406888875 0.592471935 0.817837758 NLRP2 0 7.920205 10.814798 0 18.640884 18.543885 0 3.545411
My question is how do we interpret data like this? And how to deal with this kind of situation? Can we add/average different entries associated with the same gene?