Hi,

I am working with the RNA-Seq dataset and have raw counts file with me. I notice that, there are 58785 genes in the "Gene Symbol" column and some genes are repeated twice (shown below).In this scenario, what is the best practice to handle these types of genes? Do we simply average them or sum them before using them in downstream analysis?

dput(head(Counts, 5))
structure(list(symbol = c("BM", "A2GGG", "A2GGG", "P1P", 
"P1P"), Sample_A = c(0L, 0L, 82L, 46L, 6L), Sample_B = c(1L, 
0L, 64L, 49L, 5L), Sample_C = c(2L, 0L, 96L, 44L, 6L), Sample_D = c(5L, 
0L, 85L, 38L, 3L), Sample_E = c(1L, 0L, 80L, 48L, 6L), Sample_F = c(1L, 
0L, 77L, 49L, 4L)), row.names = c(NA, 5L), class = "data.frame")

Average

(A2GGG + A2GGG)/2 = A2GGG

Sum

A2GGG + A2GGG = A2GGG

Thank you,

Toufiq



Source link