gravatar for jack.henry

2 hours ago by

I am trying to read in some expression data from the ICGC, however I am having some trouble with duplicates.

Firstly I read in the data.

PACACASeq <- read.table("./CountMatrices/PACA_CA/exp_seq.tsv", sep = 't', header = TRUE, stringsAsFactors = FALSE)

Get a table like this with counts, sample Ids and gene Ids.

enter image description here

I then use reshape2 to try to convert this into a count matrix like so:

PACACASeqCounts <- dcast(PACACASeq, gene_id ~ icgc_sample_id, value.var = "raw_read_count")

But this generates the notification

Aggregation function missing: defaulting to length

Which is resultant from there being duplicates of some sample ids/counts/gene names.
I end up getting a matrix of 1's.

I was wondering if anyone has come into the same problem and how they sorted it.

Thanks in advance.

Source link