I would like to identify CNVs from a very large exome sequencing dataset (tens of thousands) and the CLAMMS software looks very promising ( Unfortunately, issues raised on GitHub are not responded to and I was wondering whether anyone out there had ever used this software and could help with a naïve question:

The CLAMMS documentation states: "CLAMMS collects seven QC metrics for each sample and performs a fast k-nearest neighbors search algorithm". However, as far as I can tell CLAMMS does not do this automatically as part of the model building or CNV calling steps. Does the user have to generate these QC metrics (using Picard) and identify nearest neighbours themselves based on the example protocol provided?

