I have two groups WT and KO. I have roughly 950 cells in WT sample and 1700 cells in KO sample. As I mentioned in this post C: How many PCs should be considered for downstream analyses?.
Steps followed for the analyses:
- Individual seurat object for WT and KO - Merged WT_KO - Filtered_WT_KO - Split_cond_WT_KO (Normalize, cell cycle scoring and SCTransform) - SelectIntegrationFeatures - PrepSCTIntegration - FindIntegrationAnchors and SCT Normalization - IntegrateData - Run PCA - FindNeighbors - FindClusters - Run UMAP
One of the collaborator wants to downsample in order to visually see them with equal number of cells on the UMAP cluster and as well as for comparing specific markers between the WT and KO samples.
Which approach is the appropriate one,
Approach1 :Get a list of 950 KO random barcode IDs from the 1,700 cells MTX files(cellranger's matrix.mtx file), then followed the above steps for both 950 actual WT cells and random 950 KO cells (from 1700 actual cells).
Approach2: Is it ok to just downsample to 950 KO cells from 1700 KO cells on the UMAP plot without running the all prior steps? So that both WT and KO will have only 950 cells each on the plot.
What are pros and cons of approach1 and approach2?