Range reduction in glibc with reduce_sincos

This post is about the reduce_sincos range reduction function from the glibc math library. You can find the source here, starting on line 151..

Introduction

Any floating point number x can be written as \(q(\pi/2) + r\) where \(q\) is an integer and \(-\pi/4 < r < \pi/4\). In fact, \(q = [(2/\pi)x]\) where the square brackets denote rounding to the nearest integer. This process is called range reduction. The reason to do it is that sin or cosine of x differs from sin or cosine of r by a sign which we can determine from q, but evaluating a trigonometric function at r will be easier since its magnitude is guaranteed to be small.

The reduce_sincos function does range reduction for moderately sized floating point numbers x. There’s a different routine for larger ones. Here’s what the comment in the source says:

/* Reduce range of x to within PI/2 with abs (x) < 105414350.  The high part
   is written to *a, the low part to *da.  Range reduction is accurate to 136
   bits so that when x is large and *a very close to zero, all 53 bits of *a
   are correct.  */

One of the inputs to reduce_sincos is a double x with absolute value less than 105,414,350 = 0x1.921fb38000000p+26. The function aims to set the doubles a and da to an approximation of the remainder \(r\) and to return \(n\), the value of \(q\) mod 4 (unless x is negative in which case it is minus this). It doesn’t quite do this in all cases (see the end of this post) but the “error” doesn’t actually matter.

If you’re wondering where the figure 105,414,350 comes from, note that \((2/\pi) \times 105,414,350 \approx 2^{26}\) (the approximation is very close). Error estimates in the function rely on multiplication by \((2/\pi)x\) (rounded to nearest int) being exact on doubles the final half of whose fraction digits are 0.

Outline of how `reduce_sincos` works

The parameters are a double x to be reduced and the addresses of two variables a and da which will hold a multiple precision approximation to the range reduction of x. These are kept distinct because da will be a lot smaller than a so that the floating point addition a + da could lose accuracy. The return value is the remainder mod 4 of the multiple of \(\pi/2\) which will be subtracted from x

First, reduce_sincos evaluates xn = [hpinv * x] where hpinv is the nearest double to \(2/pi\) and the square brackets denote rounding to the nearest integer. Then it computes an approximation to the remainder \(x - (\pi/2)\mathtt{xn}\) by doing the subtraction in four steps, equivalent to

(((x - xn * mp1) - xn * mp2) - xn * pp3) - xn * pp4

where mp1, mp2, pp3, and pp4 are doubles forming a multiple-precision approximation to \(\pi/2\). The result of this subtraction is stored in the variable a. All of the multiplications except the last are exact as are the first two subtractions, but the final two floating point subtractions may not be. The sum of the two subtraction errors is stored in da. Finally the function returns the remainder of xn mod 4.

Details of the `reduce_sincos` code

First, the code does a float trick (lines 156-157):

double t = x * hpinv + toint;
double xn = t - toint;

The two constants here are

static const double hpinv = 0x1.45F306DC9C883p-1;
static const double toint = 0x1.8000000000000p52; // 6 755 399 441 055 744

hpinv is the nearest double to \(2/pi \approx 0.63661977236758138\). It is slightly larger than \(2/pi\), by about 3.9e-17. toint is a special double value with the property that (a + toint) - toint is the double equal to a rounded to the nearest integer. Therefore after the first two lines, \(\mathtt{xn} = [\mathtt{x} * \mathtt{hpinv}]\), where the square brackets denote rounding to the nearest integer (using round-to-even).

Lines 158-160 are as follows. v here is a mynumber which is a union of a double and an int[2], letting us access the two halves of a double as two 32b ints v.i[0] and v.i[1]. int4 is a synonym for int.

v.x = t;
double y = (x - xn * mp1) - xn * mp2;
int4 n = v.i[LOW_HALF] & 3;

&-ing with 3 sets all bits to 0 except for the 2s and 1s bits, so the effect is to reduce mod 4 (so long as the number was positive). The last line therefore sets n to the value of \(\texttt{xn} = [\mathtt{x} * \mathtt{hpinv}]\), if x is positive at least. The return value of the function has been found but it still has to calculate the remainder \(r = x - [\mathtt{x} * \mathtt{hpinv}](\pi/2)\) and store it in a and da. The double y will be a first approximation to this value.

The constants on the middle line are shown in the following table along with some others used later. Most 0s are omitted. The first two and last two lines are just column numbers.

      0          1         2         3
      0.1234567890123456789012345678901234567...
pi/2  1.921FB54442D18469898CC51701B839A252049...
hp0   1.921FB54442D18
hp1    .             469898CC51701C
mp1   1.921FB5800000
mp2  - .      3BBD2E78000000
pp3  - .             39676730000000
pp4  - .                    3AE8FE47C65DAE
      0 1234567890123456789012345678901234567...
      0          1         2         3

Note that hp0, the nearest double to \(\pi/2\), equals mp1 + mp2. The reason that y is defined using two subtractions, rather than just as x - hp0 * xn, is that this latter floating point subtraction may not be exact (e.g. when x = 100000.0) while the two subtractions actually used can be shown to be exact. The first is exact by the Sterbenz lemma (a-b is exact so long as a/2 <= b <= 2a): x and hp0 * xn are pretty close because xn is approximately \((2/\pi)\mathtt{x}\). To show that the second exact we need to show that \(x - mp1\times xn - mp2 \times xn\) (mathematical subtraction, not floating point subtraction) is a floating point number (that is, one of the possible values of a 64 bit double).

Note that \(|\pi/2 - mp1 - mp2| < 2^{-52}\), \(|xn| < 2^{26}\), and that we can write \(mp1 = s_1 2^{-25}\) and \(mp2 = -s_2 2^{-53}\) for whole numbers \(s_1, s_2\). Now \(x-mp1\times xn - mp2 \times xn = 2^{-53}(2^{53}x - xn(2^{28}s_1 - s_2))\). It’s enough (because denormal numbers are allowed) to show that the integer \(2^{53}x - xn(2^{28}s_1 - s_2)\) is less than \(2^{53}\) in magnitude. That’s equivalent to \(|x-mp1\times xn - mp2\times xn|<1\). This is true because it’s at most \(|x-\pi/2 xn| + |xn||\pi/2-mp1-mp2|\). The second term is very small because \(|xn|<2^{26}\) while \(mp1+mp2\) is less than \(2^{-52}\). For the first, \(|(2/\pi x - xn)| \leqslant 1/2\) so \(|x - xn \pi/2 |< \pi/4 \approx 0.785\).

Incidentally, studying exactness of floating point arithmetic operations is much easier when you have a convenient way to check your work. Python’s fractions module is really useful here, e.g.

from fractions import Fraction
fff = Fraction.from_float

def sub_error(f1, f2):
    # Compute the error in the fp subtraction f1 - f2
    f1_frac = fff(f1)
    f2_frac = fff(f2)
    diff_frac = fff(f1-f2)
    return diff_frac - (f1_frac - f2_frac)

def test_exactness(x):
    xn = (x*hpinv + toint) - toint
    return sub_error(x - xn*mp1, xn*mp2)

Back to reduce_sincos, lines 163-165 are:

t1 = xn * pp3;
t2 = y - t1;
db = (y - t2) - t1;

To increase the accuracy of the approximation y to the remainder \(r = x - [(2/\pi)x](\pi/2)\) we subtract off pp3 * xn where pp3 is as above. The floating point subtraction is no longer exact - in fact, the rounding error is exactly db. This follows from the section of The Art of Computer Programming on floating point arithmetic: 4.2.2 eq. (3) says \(u \ominus v = u \oplus -v\). Using this in 4.2.2 Theorem C gives (for \(|u|>|v|\)) \(u-v - (u \ominus v) = (u \ominus (u \ominus v))\ominus v\). For us, using the convention that circled operations are floating point arithmetic and uncircled ones are real number arithmetic, db is \((y \ominus (y \ominus t1)) \ominus t1 = y-t1 - (y \ominus t1)\) by Theorem C.

The next block of three lines sets b to (y - xn * pp3) - xn * pp4 and increases db by the rounding error in doing the new t2 minus xn * pp4, so that in the end the value of b is (((x - xn * mp1) - xn * mp2) - xn * pp3) - xn * pp4 and db is the sum of the rounding errors for the last two floating point subtractions (assuming no underflow).

Taken together, the four constants mp1, mp2, pp3, pp4 give us a 136b approximation \(p\) to \(\pi/2\). Ignoring the subtraction errors, which we know exactly, the difference between our \(y\) and the true remainder \(\mathtt{x}-(\pi/2)\mathtt{xn}\) is \(| y - (\mathtt{x} - (\pi/2)\mathtt{xn})| = |p \mathtt{xn} - (\pi/2) \mathtt{sn}| \leqslant 2^{-136}|\mathtt{xn}| \leqslant 2^{-110}\) using the restriction on the size of x. To decide if this is good enough for all 53 significant figures of y to be correct, we need to know how small the remainder could theoretically be: if it was tiny, this error might make the estimate useless. In other words, we need to know how close a floating point number x in our range could be to an integer multiple of \(\pi/2\). If every floating point number in the range is at least \(2^{-110+53} = 2^{-57}\) from an integer multiple of \(\pi/2\), we’re OK. I haven’t checked this, but it is certainly plausible: in Argument reduction for huge arguments: good to the last bit by K.C.Ng it is claimed that the minimum distance to a multiple of \(\pi/2\) over all 64b doubles is larger than \(2^{-61}\), and we are only dealing with a very small subset of the possible double values.

It will be useful to have an idea of the size of the sum of the two subtraction errors stored in da. Floating point subtractions are required to be correctly rounded, so they must be within half an ulp of the true answer. The final subtraction gives us y, so the second subtraction error is at most half an ulp for y. The correct result of the penultimate subtraction could in principle be \(2^{26}\) times bigger than y, so the order of magnitude for the sum of the subtraction errors is at most \(2^{-27}y\). That is, da will be at most \(2^{-27}\) times a.

`[hpinv * x]` versus `[2x/π]`

Range reduction modulo \(\pi/2\) usually refers to the process of taking a possibly very large floating point number x and finding an integer q such that \(x = q(\pi/2) + r\) where \(r \in [-\pi/4, \pi/4]\). The reason to do this is that up to a sign, which you can work out from the residue class of q mod 4, sin or cos of x equals sin or cos of r while r is small enough that calculating its sine or cosine is relatively easy. The true value of q in this equation is \((2/\pi)x\) rounded to the nearest integer (no need for any tiebreaking: this number is irrational so it cannot be half way between two integers).

The reduce_sincos function doesn’t quite do this. The integer q, named xn in the code, equals [hpinv * x] where hpinv is the nearest double to \(2/\pi\). This doesn’t have enough precision to agree with [2x/π] for all x in our range. Experimenting with sympy.ntheory.continued_fraction produces a concrete example of 61,462,730.5. The true value of 2x/π is 39128389.499999999026, so [2x/π] = 39,128,389 while hpinv * x is 39128389.5 which rounds up to 39,128,390. This means the reduced value a computed by reduce_sincos will be strictly less than \(-\pi/4\) (-0.7853981649269647, over \(10^{-9}\) outside the range - this isn’t compensated for by the subtraction error da which is of much smaller magnitude).

While it means reduce_sincos is not always calculating the correct range reduction, I don’t believe it will ever affect the sin and cos calculations. sin and cos of x can still be calculated from the computed values of a, da, and xn. If the a-value produced was very large then some of the assumptions used in calculating its sin or cosine by other functions might be broken, but this doesn’t occur.

Introduction

Outline of how reduce_sincos works

Details of the reduce_sincos code

[hpinv * x] versus [2x/π]

Outline of how `reduce_sincos` works

Details of the `reduce_sincos` code

`[hpinv * x]` versus `[2x/π]`