I am working with the RNA-Seq dataset and have raw counts file with me. I notice that, there are 58785 genes in the "Gene Symbol" column and some genes are repeated twice (shown below).In this scenario, what is the best practice to handle these types of genes? Do we simply
average them or
sum them before using them in downstream analysis?
dput(head(Counts, 5)) structure(list(symbol = c("BM", "A2GGG", "A2GGG", "P1P", "P1P"), Sample_A = c(0L, 0L, 82L, 46L, 6L), Sample_B = c(1L, 0L, 64L, 49L, 5L), Sample_C = c(2L, 0L, 96L, 44L, 6L), Sample_D = c(5L, 0L, 85L, 38L, 3L), Sample_E = c(1L, 0L, 80L, 48L, 6L), Sample_F = c(1L, 0L, 77L, 49L, 4L)), row.names = c(NA, 5L), class = "data.frame")
(A2GGG + A2GGG)/2 = A2GGG
A2GGG + A2GGG = A2GGG