gravatar for Peter

2 hours ago by

Hello everyone,

This is my first analysis of RNA-seq data. I am using the TCGAbiolinks package. Initially, I am using the "TCGA-BRCA" project and I am using samples of healthy tissue and primary tumors.

I am downloading the data in HTSeq-FPKM-UQ, which are being stored in the variable "my_data". After downloading the data, I assign the corresponding groups. The TP vector stores the IDs of patients with a primary tumor, and the NT vector stores the IDs of normal patients.

My question is whether the following steps are adequate:

dataPrep <- TCGAanalyze_Preprocessing(object = my_data, cor.cut = 0.6)
dataFilt <- TCGAanalyze_Filtering(tabDF = dataPrep,
                                  method = "quantile", 
                                  qnt.cut =  0.25)
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,dataSmNT],
                            mat2 = dataFilt[,dataSmTP],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")

After these commands, I have an output containing the logFC, p-value, FDR, and other values. I ask this question because I am not performing data normalization, as I am using the "HTSeq-FPKM-UQ" table, as I read that:

Fragments Per Kilobase of transcript per Million mapped reads upper quartile (FPKM-UQ) is a RNA-Seq-based expression normalization method. The FPKM-UQ is based on a modified version of the FPKM normalization method.

In addition, I would like to confirm that upregulated transcripts (FC greater than 1) are increased in the CTRL, applying this approach, right?

Thanks in advance!



Source link