Batch correction proteomics dataset

1

I have a proteomics dataset where (N 36 healthy vs 36 diseased samples were analyzed). The samples were analyzed in 8 batches. I have the data matrix with normalized and non-normalized values. The dataset for final analysis was filtered to retain to 70% valid values in each group so this has introduced missingness in my data (501 rows). I am using BatchQC and following steps from the example as given on a real protein expression dataset to correct the batch effect but it was only considering the rows that were having all expression values.

Following concerns I have

  • Should I use normalized or non-normalized values and log2 transformed?
  • How should I handle the missing values? (I don't want to impute). I tried replacing missing values with zero but it did not help. The BatchQC only corrected the rows having all the expression values.
  • Do I have to worry about taking into account for the biological variables such as age, caner stage and marker values?

Thank you in advance,
Santosh


proteomics


correction


missing


Batch


values

• 84 views

Should I use normalized or non-normalized values and log2 transformed?

That depends on the batch-correction method - check the relevant documentation. If using limma:removeBatchEffect(), please use log2-transformed.

How should I handle the missing values? (I don't want to impute). I
tried replacing missing values with zero but it did not help. The
BatchQC only corrected the rows having all the expression values.

Can you define "did not help"? There are other things to try:

  • impute as half of the lowest non-zero value in the dataset
  • impute as median on a protein-wise basis, if using univariate statistical tests

Do I have to worry about taking into account for the biological
variables such as age, caner stage and marker values?

This is for you to decide after analysing the data. For example, check these variables via ANOVA or Kruskal-Wallis test (non-parametric ANOVA), or via PCA analysis.

Kevin


Login
before adding your answer.

Traffic: 2888 users visited in the last hour



Source link