Some theoretical considerations will help to clarify this problem.
It is relatively straight forward to deduce that the problem complexity class is not linear but quadratic over the number of genomes n, or with some simplifications: Given g is proportional to genome size, and T_p(g) is the complexity of an algorithm comparing each pair of genomes,
and all n genomes are of equal size, then an algorithm must compare all genomes against all other genomes, or T_o(n) = O(n^2) * T_p(g).
Because a single pairwise comparison costs the same and genome sizes are fixed for our set, we can think of it as T_p(g) as a constant and forget about it for now: T(n) = O(n^2) Of course, this constant factor is what makes the main difference in runtimes and there is some large variability from implementation etc. We also abstract from the possibility that a heuristic could not need to look at all pairs completely.
Figure 4 M in the orthoPaper shows this quite well, note that both axes are of logarithmic scale. For example, if the problem is quadratic, for every doubling of n, the run-time quadruples or T(2n) ~ 4T(n). This is of course very approximate, whatever one algorithm has as an advantage might be gained by its heuristics, and in reality, not all genomes have equal size.
So, what does that mean for your example? 2000 ~ 2048 . For 2048 genomes, the total time could be about 64 times that of 256 genomes. With the running time of the paper that would give ~ 115 days. You might gain a little on the genome size dependent factor. Therefore it is important to measure run time by always doubling the input size (2,4,8,16) in the same way as in Figure 4 M. In fact you could simply do linear regression on the log-transformed values and extrapolate run-times using both log-scales.