Hi All,

I am a newbie in computing biological data and I have 30 samples ranging from 3-8GB size obtained when studying embryonic development of cells in small and large groups.I was doing batch correction as first step of my pipeline and was struck with this step.
I am planning to do kBET and MNNcorrect batch correction methods and would like to know how I can generate the matrix [rows -cells, columns -genes]. I am assuming that I will copy all the file names to rows - which would be my cells; and copy genes as columns. Then I was confused as to what genes to add as columns.

  1. can you please confirm if my understanding is right?
  2. Please suggest what I choose as genes and where can I obtain the gene list?
  3. Now I am starting to wonder if my pipeline is correct. Please suggest if anyone came across a pipeline which involves batch correction, before QC or alignment to reference?

Thanks a lot for your support!!


