I have to perform what amounts to basically a correlation calculation on dosages from every row of what is equivalent to a 300M varaint X 30K sample VCF.
One thing I am wondering is if this would be faster to write a C plugin and work with BCFs or to Use Python and read in chunks and convert to a numpy matrix before performing my calculation. I am fairly sure the Python approach is going to take a really long time, butI don't know if C would be any faster. Does anyone have any suggestions of how to approach this with performance in mind. I would greatly appreciate any tips. Thank you.