National University of Ireland, Galway

Hi, quick question,

For the bustools output, is the <prefix>.genes.txt file supposed to contain matching gene symbols in the output? Reading this galaxy tutorial suggests cellranger gene output files have corresponding symbols. It seems logical that bustools would do the same (I mean why not, it has all the information it needs to do so). However, for 2 separate analyses, my file has contained ENSG ID's only.

I have tried a few ways to fix this -- stripping id version from transcripts.txt of kallisto bus output and the transcripts_to_genes.txt file supplied to bustools count did not fix it.

Subsetting a file containing ensembl_gene_id_version and external_gene_name using the <prefix>.genes.txt file in R (bioMart query) resulted in a shorter output file, which scanpy threw an error when attempting to load it into python.

The end goal is to put these files into scanpy for analysis, but I'm not sure at what stage in the pipeline I am supposed to diagnose this.

