I have about a dozen RNA-seq samples from a human tissue (DRG), which during RNA extraction, were enriched for one of the cell types found in that tissue (neurons). In other words, all samples should ideally be from neurons. But I have the suspicion that some of the samples are bad, in that they either have heavy contamination from other cell types OR the library preparation was somehow compromised.
To find the samples that are indeed coming from neurons (irrespective of the phenotypic difference that I am studying, which in this case is pain), I am thinking of doing clustering of all the samples based on their coding transcriptome i.e. the TPM values of all the coding genes in each sample. I hope to be able to find a big cluster that would represent samples that are actually coming from neurons, and then some samples to be outliers indicating undesirable samples.
Is there a known tool/package that could help me do this?