|  | Home | Libraries | People | FAQ | More | 
        A second example compares four generalized nth-root finding algorithms for
        various n-th roots (5, 7 and 13) of a single value 28.0, for four floating-point
        types, float, double, long
        double and a Boost.Multiprecision
        type cpp_bin_float_50. In
        each case the target accuracy was set using our "recomended" accuracy
        limits (or at least limits that make a good starting point - which is likely
        to give close to full accuracy without resorting to unnecessary iterations).
      
| Function | Precision Requested | 
|---|---|
| TOMS748 | numeric_limits<T>::digits - 2 | 
| Newton | floor(numeric_limits<T>::digits * 0.6) | 
| Halley | floor(numeric_limits<T>::digits * 0.4) | 
| Schröder | floor(numeric_limits<T>::digits * 0.4) | 
Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code root_n_finding_algorithms.cpp.
The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.
To pick out the 'best' and 'worst' algorithms are highlighted in blue and red. More than one result can be 'best' when normalized times are indistinguishable within the uncertainty.
Fraction of full accuracy 1
Table 10.3. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 7 | 457 | 2.00 | 0 | 11 | 860 | 3.54 | 1 | 11 | 806 | 3.02 | 1 | 12 | 226875 | 8.11 | 0 | ||||
| Newton | 3 | 228 | 1.00 | 0 | 4 | 243 | 1.00 | -1 | 4 | 298 | 1.12 | -1 | 6 | 27968 | 1.00 | 0 | ||||
| Halley | 2 | 250 | 1.10 | 0 | 3 | 268 | 1.10 | 0 | 3 | 267 | 1.00 | 0 | 4 | 52812 | 1.89 | 0 | ||||
| Schröder | 2 | 256 | 1.12 | 0 | 3 | 271 | 1.12 | -1 | 3 | 270 | 1.01 | -1 | 4 | 61406 | 2.20 | 0 | 
Table 10.4. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 12 | 825 | 3.06 | 1 | 15 | 1145 | 4.06 | 2 | 15 | 1159 | 4.17 | 2 | 14 | 295781 | 8.12 | 0 | ||||
| Newton | 5 | 270 | 1.00 | 0 | 6 | 282 | 1.00 | 0 | 6 | 278 | 1.00 | 0 | 8 | 36406 | 1.00 | 0 | ||||
| Halley | 4 | 303 | 1.12 | 0 | 5 | 329 | 1.17 | 0 | 5 | 335 | 1.21 | 0 | 6 | 78281 | 2.15 | 0 | ||||
| Schröder | 5 | 340 | 1.26 | 0 | 6 | 432 | 1.53 | 0 | 6 | 367 | 1.32 | 0 | 7 | 85156 | 2.34 | 0 | 
Table 10.5. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 12 | 714 | 3.16 | -2 | 14 | 909 | 4.19 | 2 | 14 | 793 | 3.69 | 2 | 17 | 211718 | 9.28 | 2 | ||||
| Newton | 6 | 226 | 1.00 | 0 | 7 | 217 | 1.00 | 0 | 7 | 215 | 1.00 | 0 | 9 | 22812 | 1.00 | 0 | ||||
| Halley | 4 | 262 | 1.16 | -1 | 5 | 260 | 1.20 | 0 | 5 | 260 | 1.21 | 0 | 6 | 40781 | 1.79 | 0 | ||||
| Schröder | 6 | 332 | 1.47 | 0 | 7 | 314 | 1.45 | 0 | 7 | 310 | 1.44 | 0 | 8 | 67187 | 2.95 | 0 | 
Fraction of full accuracy 1
Table 10.6. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 7 | 239 | 1.50 | 0 | 11 | 451 | 2.53 | 1 | 11 | 439 | 2.49 | 1 | 12 | 90312 | 7.51 | 0 | ||||
| Newton | 3 | 159 | 1.00 | 0 | 4 | 178 | 1.00 | -1 | 4 | 176 | 1.00 | -1 | 6 | 12031 | 1.00 | 0 | ||||
| Halley | 2 | 168 | 1.06 | 0 | 3 | 203 | 1.14 | 0 | 3 | 198 | 1.13 | 0 | 4 | 20937 | 1.74 | 0 | ||||
| Schröder | 2 | 173 | 1.09 | 0 | 3 | 206 | 1.16 | -1 | 3 | 203 | 1.15 | -1 | 4 | 26250 | 2.18 | 0 | 
Table 10.7. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 12 | 385 | 2.19 | 1 | 15 | 635 | 3.13 | 2 | 15 | 621 | 3.17 | 2 | 14 | 114843 | 6.81 | 0 | ||||
| Newton | 5 | 176 | 1.00 | 0 | 6 | 203 | 1.00 | 0 | 6 | 196 | 1.00 | 0 | 8 | 16875 | 1.00 | 0 | ||||
| Halley | 4 | 209 | 1.19 | 0 | 5 | 254 | 1.25 | 0 | 5 | 246 | 1.26 | 0 | 6 | 32343 | 1.92 | 0 | ||||
| Schröder | 5 | 223 | 1.27 | 0 | 6 | 273 | 1.34 | 0 | 6 | 275 | 1.40 | 0 | 7 | 45156 | 2.68 | 0 | 
Table 10.8. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 12 | 467 | 2.42 | -2 | 14 | 648 | 3.06 | 2 | 14 | 640 | 2.99 | 2 | 17 | 170000 | 8.85 | 2 | ||||
| Newton | 6 | 193 | 1.00 | 0 | 7 | 212 | 1.00 | 0 | 7 | 214 | 1.00 | 0 | 9 | 19218 | 1.00 | 0 | ||||
| Halley | 4 | 209 | 1.08 | -1 | 5 | 256 | 1.21 | 0 | 5 | 250 | 1.17 | 0 | 6 | 32656 | 1.70 | 0 | ||||
| Schröder | 6 | 248 | 1.28 | 0 | 7 | 306 | 1.44 | 0 | 7 | 298 | 1.39 | 0 | 8 | 53437 | 2.78 | 0 | 
Fraction of full accuracy 1
Table 10.9. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 7 | 206 | 2.24 | 0 | 11 | 460 | 4.04 | 1 | 9 | 554 | 4.40 | 0 | 12 | 57656 | 8.39 | 0 | ||||
| Newton | 3 | 92 | 1.00 | 0 | 4 | 114 | 1.00 | -1 | 5 | 126 | 1.00 | 0 | 6 | 6875 | 1.00 | 0 | ||||
| Halley | 2 | 106 | 1.15 | 0 | 3 | 134 | 1.18 | 0 | 3 | 178 | 1.41 | 0 | 4 | 12500 | 1.82 | 0 | ||||
| Schröder | 2 | 126 | 1.37 | 0 | 3 | 143 | 1.25 | -1 | 3 | 198 | 1.57 | 0 | 4 | 15312 | 2.23 | 0 | 
Table 10.10. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 12 | 345 | 2.09 | 1 | 15 | 615 | 3.14 | 2 | 13 | 875 | 3.98 | 0 | 14 | 70937 | 7.32 | 0 | ||||
| Newton | 5 | 165 | 1.00 | 0 | 6 | 196 | 1.00 | 0 | 7 | 220 | 1.00 | 0 | 8 | 9687 | 1.00 | 0 | ||||
| Halley | 4 | 193 | 1.17 | 0 | 5 | 239 | 1.22 | 0 | 5 | 298 | 1.35 | 0 | 6 | 19062 | 1.97 | 0 | ||||
| Schröder | 5 | 217 | 1.32 | 0 | 6 | 270 | 1.38 | 0 | 6 | 367 | 1.67 | 0 | 7 | 27343 | 2.82 | 0 | 
Table 10.11. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
| float | double | long d | cpp50 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algo | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | Its | Times | Norm | Dis | ||||
| TOMS748 | 12 | 412 | 2.15 | -2 | 14 | 646 | 2.96 | 2 | 14 | 1054 | 4.22 | 1 | 17 | 107187 | 9.53 | 2 | ||||
| Newton | 6 | 192 | 1.00 | 0 | 7 | 218 | 1.00 | 0 | 7 | 250 | 1.00 | 0 | 9 | 11250 | 1.00 | 0 | ||||
| Halley | 4 | 200 | 1.04 | -1 | 5 | 243 | 1.11 | 0 | 5 | 345 | 1.38 | 0 | 6 | 19687 | 1.75 | 0 | ||||
| Schröder | 6 | 254 | 1.32 | 0 | 7 | 321 | 1.47 | 0 | 7 | 471 | 1.88 | 0 | 8 | 33281 | 2.96 | 0 | 
Some tentative conclusions can be drawn from this limited exercise.
double.
          double allows convergence to the final
            exact result in one or two iterations. So in this contrived example,
            crudely dividing the exponent by N for a 'guess', it would be far better
            to use a pow<double>
            or , if more precise pow<long double>,
            function to estimate a 'guess'. The limitation of this tactic is that
            the range of possible (exponent) values may be less than the multiprecision
            type.
          Clearly, your mileage will vary, but in summary, Newton-Raphson iteration seems the first choice of algorithm, and effort to find a good 'guess' the first speed-up target, especially for Boost.Multiprecision. And of course, compiler optimisation is crucial for speed.