Computing 1 million digits of Euler's Number

With this thread I start a new mathematical and computational branch: the art of computing decimals of particular mathematical constants. The first one will be e, the Euler’s number, about 2.7182818284, which is the base of exponential function e^x, the only function equal to its derivative. Its inverse is the natural logarithm ln x, or logarithm in base e.

It is defined as:

lim _{n → ∞} (1 + 1/n)ⁿ

There is a lot of interesting facts on e, proofs of its irrationality and transcendentality, lots of beautiful formulas (as the e^iπ + 1 = 0, a formula involving 5 major math constants), but today I want to show how to compute it.

Computing an approximation of e using its definition gives a poor convergence, we gain only a decimal for each x10 increase of n:

(1+1/10)¹⁰ = 2.593742…

(1+1/100)¹⁰⁰ = 2.704813…

(1+1/1000)¹⁰⁰⁰ = 2.716923…

(1+1/10000)¹⁰⁰⁰⁰ = 2.718145…

(1+1/100000)¹⁰⁰⁰⁰⁰ = 2.718268…

Fortunately we can use Newton’s binomial expansion to rewrite it in a way more suitable for computation:

$$\begin{eqnarray} (a + b)^n & = & \sum_{k=0}^{n} \binom{n}{k} a^{n-k} b^k \nonumber \\ &=& \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} a^{n-k} b^k \end{eqnarray}$$

In our case a=1 and b=1/n:

$$\begin{eqnarray} \left(1 + \frac{1}{n}\right)^n & = & \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} 1^{n-k} \left(\frac{1}{n}\right)^k \nonumber \\ &=& \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} \left(\frac{1}{n}\right)^k \end{eqnarray}$$

As n → ∞, n!/(n–k)! ∽ n^k that simplifies with (1/n)^k so:

$$\begin{eqnarray} \lim_{n\to\infty} \left(1 + \frac{1}{n}\right)^n & = & \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} \left(\frac{1}{n}\right)^k \nonumber \\ &=& \sum_{k=0}^{\infty} \frac{1}{k!} \end{eqnarray}$$

This formula is very convenient for e computation, let f(k) be the sum of first k element of the summation:

f(2) = 1/0! + 1/1! + 1/2! = 2.5

f(4) = 1/0! + 1/1! + 1/2! + 1/3! + 1/4! = 2.708333…

f(5) = 1/0! + 1/1! + 1/2! + 1/3! + 1/4! + 1/5! = 2.716666…

f(7) = 2.718253968…

f(10) = 2.7182818011…

And the greater is k and the faster it converges, since the remaining part of the series is small respect to the kth term.

Time complexity

Let’s do a rough comparison of the complexity of this two methods of computation (definition vs binomial expansion).

The naive implementation of definition requires n multiplications of a number with decimals. With can use clever choices of n to simplify a lot this task. For example if we choose n=2^k than 1+1/n has a finite number of decimals (k to be accurate) and we can have the exact value of (1+1/n)ⁿ with just k multiplications just multiply at each round the result by itself. With this approach great part of the computed digits are useless, only the few more significant will be right for our approximation (we should prove it, but from the examples above it looks like proportional to log₁₀ n, that is proportional to log n).

Maybe we can at each round keep only the more significant digits (lets say k), so we’ll have to do k multiplications of numbers of k digits. So for log₁₀ n digits we perform (log n)³ operations (or (log n)² (log log n) with better multiplication algorithms).

Now consider the binomial expansion. If we need to compute n digit we can start with sum=10ⁿ and fact=10ⁿ, then start to divide fact by k and add it to sum as long as it is greater than 1, for k=1, 2, … (so we don’t need to compute and divide for the various k! but just divide a term each time for a new simple number).

Division for small numbers in a computer implementation requires linear time on the length of the number, so the addiction. fact decreases factorially, so if we want to find how many steps are required to zero fact we get step! = 10ⁿ, or by Stirling’s approximation step log step ∽ n log10. This is very bad to compute, but it is asymptotically less then n. So to compute n digits we perform something less than n² operation, better than the definition approach.

Code

If we want to implement it we can not rely on double precision, since it guarantees only about 15 exact digits, nevertheless it is useful to warm-up. This time let’s use Java.

    public static double binomialDouble() {
        double e = 1;
        double fract = 1;

        for (int i=1; i<20; i++) {
            fract /= i;
            e += fract;
        }
        return e;
    }

Within milliseconds it returns a result with all digits allowed by a double. Let’s try again with BigInteger.

public static BigInteger binomialBig(int digits) {
    String base = "000000000";

    while (base.length()<digits+10) base = base + base;
    if (base.length()>digits+10) base = base.substring(0, digits+10);

    BigInteger e = new BigInteger("1"+base);
    BigInteger fract = new BigInteger("1"+base);
    int i = 1;

    while (fract.compareTo(BigInteger.ONE) > 0) {
        fract = fract.divide(BigInteger.valueOf(i));
        e = e.add(fract);
        i++;
    }
    return e;
}

The code is for a single cpu, here some misurations of execution time:

digits	Computation time	ratio with prev
10⁴	99 msec.
10⁵	2976 msec.
2*10⁵	11228 msec.	*3.77
4*10⁵	42000 msec.	*3.74
8*10⁵	153828 msec.	*3.66
10⁶	242360 msec.	*81.43 (respect to 10⁵)

Execution times using BigInteger variables

Now, since we are here to write real code, let’s try to manage by ourself arbitrary precision integers. We can do it using arrays and basic sum and division algorithms taught in primary school. If the array cells contain numbers less than 10 we are in the school case, but we would waste a lot of space and operations. We don’t want to convert in other numerical radices the result, so we’ll consider power of 10 (like 100, 1000…).

Since longs have maximum value 9*10¹⁸, 10⁹ is the maximum power of 10 we can use. The code is similar to previous but we are using arrays of longs as numbers, implementing elementary algorithms for sum and division.

public static long[] binomialCust(int digits) {
    long BASE = 1000000000;

    int fract_length = digits / 9;
    while (fract_length%4!=0)fract_length++;

    long[] e = new long[fract_length];
    long[] fract = new long[fract_length];
    e[0] = fract[0] = BASE/10;
    int fract_start=0;
    int i=1;

    while (fract_start<fract.length-1 || fract[fract.length-1]>100)
    {
        long rem = 0;
        // fract /= i
        for (int j=fract_start; j<fract.length; j++) {
            long n = rem*BASE + fract[j];
            fract[j] = n/i;
            rem = n%i;
        }
        // e += fract
        for (int j=fract_start; j<fract.length; j++) {
            e[j] += fract[j];
            int k=j;
            while (e[k]>BASE) {
                e[k-1] += e[k]/BASE;
                e[k]%=BASE;
            }
        }

        if (fract[fract_start]==0) fract_start++;

        i++;
    }

    return e;
}

Some timing:

digits	Comp time
10⁴	61 msec.
10⁵	4170 msec.
2*10⁵	15834 msec.
4*10⁵	61803 msec.

Timing implementing by ourself arbitrary precision arithmetics

This are slightly slower than BigInteger. It’s optimization time!!!

Code Optimization

The above code was in Java. Let’s switch to C, we just need to change arrays in pointers.

int arrayLength;

int64_t  *binomialCust(int digits) {
    int64_t  BASE = 1000000000;

    int fract_length = digits / 9;
    while (fract_length%4!=0)fract_length++;
	arrayLength = fract_length;

    int64_t  *e = (int64_t  *)malloc(sizeof(int64_t )*fract_length);
    int64_t  *fract = (int64_t  *)malloc(sizeof(int64_t )*fract_length);
    e[0] = fract[0] = BASE/10;
    int fract_start=0;
    int i=1;

    while (fract_start<fract_length-1 || fract[fract_length-1]>100)
    {
        int64_t  rem = 0;
        // fract /= i
        for (int j=fract_start; j<fract_length; j++) {
            int64_t  n = rem*BASE + fract[j];
            fract[j] = n/i;
            rem = n%i;
        }
        // e += fract
        for (int j=fract_start; j<fract_length; j++) {
            e[j] += fract[j];
            int k=j;
            while (e[k]>BASE) {
                e[k-1] += e[k]/BASE;
                e[k]%=BASE;
            }
        }

        if (fract[fract_start]==0) fract_start++;

        i++;
    }
	free(fract);

    return e;
}

Now we’ll just try the code as is, with -O2 and -O3:

size	java bigint	java cust	no opt	-O2	-O3
10⁴	0.099	0.67	0.08	0.039	0.038
10⁵	2.97	4.17	6.29	2.90	2.90
2*10⁵	11.22	15.83	23.79	10.97	10.81

Comparison between Java using BigInt and array, and C with arrays and various compile option.

The -O3 this time obtains the same timing of -O2 and is about 50% faster than the same Java code, and about as fast as the Java BigInteger version. Now we’ll try to merge the two for to see if -O3 will unroll and vectorize them automatically if joined; if not we’ll try to do it by ourself.

The fusion of the two for and an unrolling of 4 steps only give a <10% improvement in -O2/-O3 (10⁵ requires 2.65 sec and 2*10⁵ 10.07 sec.):

for (int j=fract_start; j<fract_length; j+=4) {
    // fract /= i
    // e += fract
    int64_t  n = rem*BASE + fract[j];
    fract[j] = n/i;
    rem = n%i;
    e[j] += fract[j];
	
    n = rem*BASE + fract[j+1];
    fract[j+1] = n/i;
    rem = n%i;
    e[j+1] += fract[j+1];

    n = rem*BASE + fract[j+2];
    fract[j+2] = n/i;
    rem = n%i;
    e[j+2] += fract[j+2];
			
    n = rem*BASE + fract[j+3];
    fract[j+3] = n/i;
    rem = n%i;
    e[j+3] += fract[j+3];
            
    if (e[j+3]>BASE) {
        e[j+2] += e[j+3]/BASE;
        e[j+3]%=BASE;
    }
    if (e[j+2]>BASE) {
        e[j+1] += e[j+2]/BASE;
        e[j+2]%=BASE;
    }
    if (e[j+1]>BASE) {
        e[j] += e[j+1]/BASE;
        e[j+1]%=BASE;
    }
    int k=j;
    while (e[k]>BASE) {
        e[k-1] += e[k]/BASE;
        e[k]%=BASE;
        k--;
    }
}

We can note that the carry of e can be adjusted only at the end on the summation and there is any reason to keep it updated at every cycle. This gives some advantage, resulting in 10⁵ that requires 2.15 sec and 2*10⁵ 8.14 sec. (the optimized versions are now 25% faster than original version, with the same modifications the non-optimized version changes from 6.29 to 3.38 and from 23.79 to 12.90 to compute 10⁵ and 2*10⁵ digits, so about 50%). With this code we can compute 1 million digits in less than 3 minutes.

We should try to divide in chunk the computation, to exploit locality, for example dividing for several i a chunk of 8 longs, but for now I stop here letting you below a link to 1 million digits of e and the code to compute more digits if you want (I think it would take something like 4 hours to compute a billion digits with the C code, and it should use something like 1 GB of RAM).

I won’t let a program of mine to compute 1 billion digits if it takes more than 1 hour, so this is an open point for me. Related to this I’ll publish at least the computation of Pi as sum of arctan (behind computation, Pi has an interesting history!), a maybe some other variable if a find again my old stuffs.

1 million digits of e

https://github.com/zagonico86/my-snippets/tree/main/math-constants/e