I performed patch-seq for 2 sets of neurons and then used DESeq2 to look for transcriptomic differences between the groups. One group consists of 7 neurons and the second group consists of 9 neurons.

Genes that meet a threshold criteria of L2FC of more than 1.5 and adjusted p-value less than 0.01, are considered as differentially expressed (DE) genes. I get a total of 123 DE genes for my dataset.

I notice that for some genes which are visibly different are not picked as DE genes by DESeq2. A plausible reason for this is DESeq2 is treating the zero counts as dropouts. So, if I added a constant value to all the gene counts which gets rid of the zeroes. Now the visibly different gene is picked as a DE gene. Also, adding the constant value of 1 to the gene counts now gives me 953 DE genes.What this means is that DESeq2 got misled by the zeroes and treated some gene counts as false dropouts.

An example of such a gene that has following gene counts (gene counts for each neuron are separated by comma within the group):

**Gene 1:-**
Group-1: 400,6,0,118,644,0,4738
Group-2: 0,34,0,0,0,0,0,0,0

The DESeq2 statistics for this gene are:

base mean = 399.68; L2FC = -7.84; lfcSE = 2.56; **adjusted p-value = 0.13**

If I add a constant value (=1) to all the gene counts, now the DESeq2 statistics are:

base mean = 370.11; L2FC = -7.44; lfcSE = 1.26; **adjusted p-value = 1.35e-06**

Notice the big difference for the p-values (in bold above) between the original counts and after adding 1 to the counts. Can someone please explain why this is happening? At what step in DESeq2 the zeroes in the gene count are misleading the conclusions?


Source link