It could have O(n) recursion depth for some inputs (e.g. if already sorted or reverse sorted), which could easily cause stack overflows.
Now, recursion is only used for the smaller of the two subarrays at each step, so the maximum recursion depth is bounded to log2(n).
The new m16.int64 file contains a new "negate8" macro (for 64-bit negation), as well as an improved version of "ph8". These have been added to the individual *.macros files, but if those are regenerated, m16.int64 should be used as an input to macgen ahead of m16.ORCA (which contains the original version of ph8).
This is a fairly straightforward adaptation of the strtol/strtoul code. The multiplication routine is modified to return the next 16 bits in X so that strtoull can detect overflows.