I am currently reading this paper (www.ncbi.nlm.nih.gov/pubmed/30214446), and are using the same protocol to build a bioinformatics pipeline to look at T cell clonality, I am quite unsure about how they were able to downsample the UMI reads.
"To control this over- sequencing error in the UMI and CDR3 sequences, we randomly discard the reads until the remaining reads contain about 8 reads per UMI."
I have used umi-tools to extract the umi information but unsure how to get around this step. My understanding is that they have achieved this downsampling on the fastq files not on mapped reads.
Any help or suggestions are appreciated.