gravatar for berry

2 hours ago by

Belgium

Hi,

I have 3 single-cell RNA-seq datasets from the same platform (10X), same type of sample, same condition, but from different labs to integrate. When I check the genes.tsv or features.tsv files, even though the high majority of the IDs match, I see some differences. For example here "ENSG00000243485" corresponds to a different gene symbol in each dataset:

data1[data1$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 MIR1302-2HG   
data2[data2$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 RP11-34P13.3 
data3[data3$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 MIR1302-10

Or here "AL627309.1" gene corresponds to a different ENSEMBL id:

data1[data1$GeneName == "AL627309.1", ]
>ENSG00000238009 AL627309.1 
data2[data2$GeneName == "AL627309.1", ]
>0 rows
data3[data3$GeneName == "AL627309.1", ]
>ENSG00000237683 AL627309.1

How would you process these matrices?

Many thanks!

link

written
2 hours ago
by

berry30



Source link