gravatar for atakanekiz

2 hours ago by

Hello,

I have a question regarding accessing the subtype information associated with TCGA projects using TCGAbiolinks package (in this example, specifically COAD but my question applies to other projects including SKCM for instance)

When I download the RNAseq experiment as a SummarizedExperiment object I can access the metadata associated with the samples by calling colData(coad). In this data frame, there is information regarding MSI (microsatellite instability) status of tumors. The information I get from there is the following:

# Prepared coad object previously by using GDCdownload and GDCprepare functions


meta <- as.data.frame(colData(coad))

dim(meta)
#>[1] 521 102

summary(meta$subtype_MSI_status)
#>                      MSI-H         MSI-L           MSS Not Evaluable          NA's 
#>            0            40            42           126             0           313

Alternatively, I can also download subtype information using TCGAquery_subtype function. When I do that and look at the MSI data in the downloaded data frame, this is what I see:

subtype <- TCGAbiolinks::TCGAquery_subtype("COAD")

dim(subtype)
#>[1] 276  45

summary(subtype$MSI_status)
#>                      MSI-H         MSI-L           MSS Not Evaluable 
#>           0            38            44           193             1

A similar discrepancy is also present when comparing survival times between SummarizedExperiment and TCGAquery_subtype data frames. One has a shorter followup time than the other for some patients (ie. the patient is censored at an early date with alive vital_status in one data frame whereas he/she appears deceased in the other data frame at a later time point.

What is the reason for the discrepancy between different subtype data? I remember having similar issues with SKCM (both for subtype and survival data). I would appreciate if you can let me know which is the more accurate version to use.

Best,
Atakan

link

modified 1 hour ago

written
2 hours ago
by

atakanekiz180



Source link