I have eight 10x datasets (samples). 4 samples facs sorted for cell type a and four samples for cell type b. I want all the data to be analysed together. Just merging them works but I think there are some sample-to-sample technical artifacts. Integrating all of them also seems to be giving strange results. So I decided I would integrate all sample for cell A (dataset x) and separately for cell type B (dataset y) and then merge both datasets.

lst <- SplitObject(sf,split.by="sample")

# integration for cell type a samples
microlist <- lst[1:4]
for (i in 1:length(microlist)) {
    microlist[[i]] <- SCTransform(microlist[[i]],return.only.var.genes=F)
}
sf.features <- SelectIntegrationFeatures(microlist,nfeatures=5000)
microlist <- Seurat::PrepSCTIntegration(microlist,anchor.features=sf.features)
sf.anchors <- FindIntegrationAnchors(object.list=microlist,normalization.method="SCT",anchor.features=sf.features)
x <- IntegrateData(anchorset=sf.anchors,normalization.method="SCT")

# integration for cell type b samples
monolist <- lst[5:8]
for (i in 1:length(monolist)) {
    monolist[[i]] <- SCTransform(monolist[[i]],return.only.var.genes=F)
}
sf.features <- SelectIntegrationFeatures(monolist,nfeatures=5000)
monolist <- Seurat::PrepSCTIntegration(monolist,anchor.features=sf.features)
sf.anchors <- FindIntegrationAnchors(object.list=monolist,normalization.method="SCT",anchor.features=sf.features)
y <- IntegrateData(anchorset=sf.anchors,normalization.method="SCT")

My two integrated datasets look like this:

> x
An object of class Seurat 
36300 features across 18928 samples within 3 assays 
Active assay: integrated (5000 features, 5000 variable features)
 2 other assays present: RNA, SCT
 2 dimensional reductions calculated: pca, tsne

> y
An object of class Seurat 
35419 features across 20499 samples within 3 assays 
Active assay: integrated (4354 features, 4354 variable features)
 2 other assays present: RNA, SCT
 2 dimensional reductions calculated: pca, tsne

And then I merge x and y

z <- merge(x,y)
z
An object of class Seurat 
39382 features across 39427 samples within 3 assays 
Active assay: integrated (7050 features, 0 variable features)
 2 other assays present: RNA, SCT

And then I want to run a PCA, but I can't. Note that the number of variable features has become zero after merging. And then I tried to run FindVariableFeatures()

> FindVariableFeatures(z)
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  : 
  invalid 'x'

Running NormalizeData/ScaleData doesnt make a difference. I am not sure it even makes much sense to run FindVariableFeatures after scTransform and integration and merging.

What are the thoughts on merging two separately integrated datasets? Are their other tools/methods that actually return corrected raw count values that can be used as the starting point for Seurat?

R 4.0.2
Seurat 3.2.3

Update: Seurat developers have said that they do not support FindVariableFeatures() on integrated datasets. This is their response:

At the moment we don't support running VariableFeaturePlot on
integrated datasets, but you can run this function on individual
datasets prior to integration. You can however use
SelectIntegrationFeatures to identify features that are consistently
variable across datasets. You could also take the union of variable
features if you wish. Also, merging two integrated objects is not
recommended because you are only normalizing within cell types here.



Source link