I am trying to read in some expression data from the ICGC, however I am having some trouble with duplicates.
Firstly I read in the data.
PACACASeq <- read.table("./CountMatrices/PACA_CA/exp_seq.tsv", sep = 't', header = TRUE, stringsAsFactors = FALSE)
Get a table like this with counts, sample Ids and gene Ids.
I then use reshape2 to try to convert this into a count matrix like so:
PACACASeqCounts <- dcast(PACACASeq, gene_id ~ icgc_sample_id, value.var = "raw_read_count")
But this generates the notification
Aggregation function missing: defaulting to length
Which is resultant from there being duplicates of some sample ids/counts/gene names.
I end up getting a matrix of 1's.
I was wondering if anyone has come into the same problem and how they sorted it.
Thanks in advance.