With this thread I start a new mathematical and computational branch: the art of computing decimals of particular mathematical constants. The first one will be e, the Euler’s number, about 2.7182818284, which is the base of exponential function ex, the only function equal to its derivative. Its inverse is the natural logarithm ln x, or logarithm in base e.
It is defined as:
lim n → ∞ (1 + 1/n)n
There is a lot of interesting facts on e, proofs of its irrationality and transcendentality, lots of beautiful formulas (as the eiπ + 1 = 0, a formula involving 5 major math constants), but today I want to show how to compute it.
Computing an approximation of e using its definition gives a poor convergence, we gain only a decimal for each x10 increase of n:
(1+1/10)10 = 2.593742…
(1+1/100)100 = 2.704813…
(1+1/1000)1000 = 2.716923…
(1+1/10000)10000 = 2.718145…
(1+1/100000)100000 = 2.718268…
Fortunately we can use Newton’s binomial expansion to rewrite it in a way more suitable for computation:
$$\begin{eqnarray} (a + b)^n & = & \sum_{k=0}^{n} \binom{n}{k} a^{n-k} b^k \nonumber \\ &=& \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} a^{n-k} b^k \end{eqnarray}$$
In our case a=1 and b=1/n:
$$\begin{eqnarray} \left(1 + \frac{1}{n}\right)^n & = & \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} 1^{n-k} \left(\frac{1}{n}\right)^k \nonumber \\ &=& \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} \left(\frac{1}{n}\right)^k \end{eqnarray}$$
As n → ∞, n!/(n–k)! ∽ nk that simplifies with (1/n)k so:
$$\begin{eqnarray} \lim_{n\to\infty} \left(1 + \frac{1}{n}\right)^n & = & \sum_{k=0}^{n} \frac{n!}{k!(n-k)!} \left(\frac{1}{n}\right)^k \nonumber \\ &=& \sum_{k=0}^{\infty} \frac{1}{k!} \end{eqnarray}$$
This formula is very convenient for e computation, let f(k) be the sum of first k element of the summation:
f(2) = 1/0! + 1/1! + 1/2! = 2.5
f(4) = 1/0! + 1/1! + 1/2! + 1/3! + 1/4! = 2.708333…
f(5) = 1/0! + 1/1! + 1/2! + 1/3! + 1/4! + 1/5! = 2.716666…
f(7) = 2.718253968…
f(10) = 2.7182818011…
And the greater is k and the faster it converges, since the remaining part of the series is small respect to the kth term.
Time complexity
Let’s do a rough comparison of the complexity of this two methods of computation (definition vs binomial expansion).
The naive implementation of definition requires n multiplications of a number with decimals. With can use clever choices of n to simplify a lot this task. For example if we choose n=2k than 1+1/n has a finite number of decimals (k to be accurate) and we can have the exact value of (1+1/n)n with just k multiplications just multiply at each round the result by itself. With this approach great part of the computed digits are useless, only the few more significant will be right for our approximation (we should prove it, but from the examples above it looks like proportional to log10 n, that is proportional to log n).
Maybe we can at each round keep only the more significant digits (lets say k), so we’ll have to do k multiplications of numbers of k digits. So for log10 n digits we perform (log n)3 operations (or (log n)2 (log log n) with better multiplication algorithms).
Now consider the binomial expansion. If we need to compute n digit we can start with sum
=10n and fact
=10n, then start to divide fact
by k and add it to sum
as long as it is greater than 1, for k=1, 2, … (so we don’t need to compute and divide for the various k! but just divide a term each time for a new simple number).
Division for small numbers in a computer implementation requires linear time on the length of the number, so the addiction. fact
decreases factorially, so if we want to find how many steps are required to zero fact
we get step
! = 10n, or by Stirling’s approximation step log step ∽ n log10. This is very bad to compute, but it is asymptotically less then n. So to compute n digits we perform something less than n2 operation, better than the definition approach.
Code
If we want to implement it we can not rely on double precision, since it guarantees only about 15 exact digits, nevertheless it is useful to warm-up. This time let’s use Java.
public static double binomialDouble() {
double e = 1;
double fract = 1;
for (int i=1; i<20; i++) {
fract /= i;
e += fract;
}
return e;
}
Within milliseconds it returns a result with all digits allowed by a double. Let’s try again with BigInteger.
public static BigInteger binomialBig(int digits) {
String base = "000000000";
while (base.length()<digits+10) base = base + base;
if (base.length()>digits+10) base = base.substring(0, digits+10);
BigInteger e = new BigInteger("1"+base);
BigInteger fract = new BigInteger("1"+base);
int i = 1;
while (fract.compareTo(BigInteger.ONE) > 0) {
fract = fract.divide(BigInteger.valueOf(i));
e = e.add(fract);
i++;
}
return e;
}
The code is for a single cpu, here some misurations of execution time:
digits | Computation time | ratio with prev |
104 | 99 msec. | |
105 | 2976 msec. | |
2*105 | 11228 msec. | *3.77 |
4*105 | 42000 msec. | *3.74 |
8*105 | 153828 msec. | *3.66 |
106 | 242360 msec. | *81.43 (respect to 105) |
Now, since we are here to write real code, let’s try to manage by ourself arbitrary precision integers. We can do it using arrays and basic sum and division algorithms taught in primary school. If the array cells contain numbers less than 10 we are in the school case, but we would waste a lot of space and operations. We don’t want to convert in other numerical radices the result, so we’ll consider power of 10 (like 100, 1000…).
Since longs have maximum value 9*1018, 109 is the maximum power of 10 we can use. The code is similar to previous but we are using arrays of longs as numbers, implementing elementary algorithms for sum and division.
public static long[] binomialCust(int digits) {
long BASE = 1000000000;
int fract_length = digits / 9;
while (fract_length%4!=0)fract_length++;
long[] e = new long[fract_length];
long[] fract = new long[fract_length];
e[0] = fract[0] = BASE/10;
int fract_start=0;
int i=1;
while (fract_start<fract.length-1 || fract[fract.length-1]>100)
{
long rem = 0;
// fract /= i
for (int j=fract_start; j<fract.length; j++) {
long n = rem*BASE + fract[j];
fract[j] = n/i;
rem = n%i;
}
// e += fract
for (int j=fract_start; j<fract.length; j++) {
e[j] += fract[j];
int k=j;
while (e[k]>BASE) {
e[k-1] += e[k]/BASE;
e[k]%=BASE;
}
}
if (fract[fract_start]==0) fract_start++;
i++;
}
return e;
}
Some timing:
digits | Comp time |
104 | 61 msec. |
105 | 4170 msec. |
2*105 | 15834 msec. |
4*105 | 61803 msec. |
This are slightly slower than BigInteger. It’s optimization time!!!
Code Optimization
The above code was in Java. Let’s switch to C, we just need to change arrays in pointers.
int arrayLength;
int64_t *binomialCust(int digits) {
int64_t BASE = 1000000000;
int fract_length = digits / 9;
while (fract_length%4!=0)fract_length++;
arrayLength = fract_length;
int64_t *e = (int64_t *)malloc(sizeof(int64_t )*fract_length);
int64_t *fract = (int64_t *)malloc(sizeof(int64_t )*fract_length);
e[0] = fract[0] = BASE/10;
int fract_start=0;
int i=1;
while (fract_start<fract_length-1 || fract[fract_length-1]>100)
{
int64_t rem = 0;
// fract /= i
for (int j=fract_start; j<fract_length; j++) {
int64_t n = rem*BASE + fract[j];
fract[j] = n/i;
rem = n%i;
}
// e += fract
for (int j=fract_start; j<fract_length; j++) {
e[j] += fract[j];
int k=j;
while (e[k]>BASE) {
e[k-1] += e[k]/BASE;
e[k]%=BASE;
}
}
if (fract[fract_start]==0) fract_start++;
i++;
}
free(fract);
return e;
}
Now we’ll just try the code as is, with -O2 and -O3:
size | java bigint | java cust | no opt | -O2 | -O3 |
104 | 0.099 | 0.67 | 0.08 | 0.039 | 0.038 |
105 | 2.97 | 4.17 | 6.29 | 2.90 | 2.90 |
2*105 | 11.22 | 15.83 | 23.79 | 10.97 | 10.81 |
The -O3 this time obtains the same timing of -O2 and is about 50% faster than the same Java code, and about as fast as the Java BigInteger version. Now we’ll try to merge the two for to see if -O3 will unroll and vectorize them automatically if joined; if not we’ll try to do it by ourself.
The fusion of the two for and an unrolling of 4 steps only give a <10% improvement in -O2/-O3 (105 requires 2.65 sec and 2*105 10.07 sec.):
for (int j=fract_start; j<fract_length; j+=4) {
// fract /= i
// e += fract
int64_t n = rem*BASE + fract[j];
fract[j] = n/i;
rem = n%i;
e[j] += fract[j];
n = rem*BASE + fract[j+1];
fract[j+1] = n/i;
rem = n%i;
e[j+1] += fract[j+1];
n = rem*BASE + fract[j+2];
fract[j+2] = n/i;
rem = n%i;
e[j+2] += fract[j+2];
n = rem*BASE + fract[j+3];
fract[j+3] = n/i;
rem = n%i;
e[j+3] += fract[j+3];
if (e[j+3]>BASE) {
e[j+2] += e[j+3]/BASE;
e[j+3]%=BASE;
}
if (e[j+2]>BASE) {
e[j+1] += e[j+2]/BASE;
e[j+2]%=BASE;
}
if (e[j+1]>BASE) {
e[j] += e[j+1]/BASE;
e[j+1]%=BASE;
}
int k=j;
while (e[k]>BASE) {
e[k-1] += e[k]/BASE;
e[k]%=BASE;
k--;
}
}
We can note that the carry of e can be adjusted only at the end on the summation and there is any reason to keep it updated at every cycle. This gives some advantage, resulting in 105 that requires 2.15 sec and 2*105 8.14 sec. (the optimized versions are now 25% faster than original version, with the same modifications the non-optimized version changes from 6.29 to 3.38 and from 23.79 to 12.90 to compute 105 and 2*105 digits, so about 50%). With this code we can compute 1 million digits in less than 3 minutes.
We should try to divide in chunk the computation, to exploit locality, for example dividing for several i a chunk of 8 longs, but for now I stop here letting you below a link to 1 million digits of e and the code to compute more digits if you want (I think it would take something like 4 hours to compute a billion digits with the C code, and it should use something like 1 GB of RAM).
I won’t let a program of mine to compute 1 billion digits if it takes more than 1 hour, so this is an open point for me. Related to this I’ll publish at least the computation of Pi as sum of arctan (behind computation, Pi has an interesting history!), a maybe some other variable if a find again my old stuffs.
https://github.com/zagonico86/my-snippets/tree/main/math-constants/e