The prefix-sum of multiplicative function: the black algorithm

baihacker

2020.04.12

Zhouge sieve [1]

Based on

\[sf(n)=\sum_{\begin{array}{c}x \le n \\ \text{max prime factor of x} \le n^{\frac{1}{2}}\end{array}} f(x) \left (1 + \sum_{n^{\frac{1}{2}} < \text{prime p} \le \frac{n}{x}} g(p)\right)\]

an algorithm of space complexity $O(n^{\frac{1}{2}})$ time complexity $\tilde{\cal O}(n^{\frac{3}{4}})$ ($\log{n}$ is ignored) is developed.

Min_25 sieve [3]

Similar to [1] 6.5.4, mentioned by [2] 2.2, [3], this method defines

\[g_{p_k}(n) = \sum_{\begin{array}{c}2 \le x \le n \\ \text{ every prime factor of x} \ge p_k\end{array}} f(x)\]

After simplification [3], we have the formula D,

\[g_{p_k}(n) = h(n)-h(p_{k-1})+\sum_{\begin{array}{c}i \ge k \\ p_i^2 \le n\end{array}}\sum_{\begin{array}{c}c \ge 1 \\ p_i^{c+1} \le n\end{array}}f(p_i^c)g_{p_{i+1}}(\frac{n}{p_i^c})+f(p_i^{c+1})\]

$p_k$ is the $k_{th}$ prime and $h$ is the prefix-sum of $f$ defined on prime numbers.

According to [4], an improved version reduces the time complexity to $\tilde{\cal O}(n^{\frac{2}{3}})$.

The memorized implementation

If an implementation remembers all used $g_{p_k}(n)$, similar to [1] 6.5.4, we have formula M,

\[g_{p_k}(n) = g_{p_{k+1}}(n) + h(n)-h(p_{k-1})+ \sum_{\begin{array}{c}c \ge 1 \\ p_k^{c+1} \le n\end{array}}f(p_k^c)g_{p_{k+1}}(\frac{n}{p_k^c})+f(p_k^{c+1})\]

We can also call it dp-like implementation. The space and time complexity is the same as that of Zhouge sieve’s.

Proof

The number of states is given by $\sum_{i\text{ is prime}}(\frac{n}{i})^{\frac{1}{2}}$. Based on integration, $O\left(\int_{1}^{n^{\frac{1}{2}}}(\frac{n}{x})^{\frac{1}{2}}dx\right) = O(n^{\frac{3}{4}})$.

The number of state transitions is given by $\sum_{i\text{ is prime}}(\frac{n}{i^j})^{\frac{1}{2}}$ and bounded by$\sum_{i\text{ is prime}}\log_i(\frac{n}{i})(\frac{n}{i})^{\frac{1}{2}}$. Based on integration, $O\left(\int_{1}^{n^{\frac{1}{2}}}\log(\frac{n}{x})(\frac{n}{x})^{\frac{1}{2}}dx\right) = \tilde{\cal O}(n^{\frac{3}{4}})$.

The space complexity is $O(n^{\frac{1}{2}})$, we can either use two-buffer trick or update inplace while taking care of the updating order.

For all the values of $h(\frac{n}{i})$, the complexity is up to $h$.

The black algorithm

Just implement the formula D directly, we usualy use a dfs to achieve it.

An interpretation that helps us uderstand this algorithm: we can divide the integers no more than $n$ into classes as \(\text{class}_{t} = \{ x = t * p \ \vert \ p \text{ is prime and } p \ge \text{max prime factor}(t) \}\). Then just iterate all possible $t$ and compute the contribution of each class. (Mentioned by [2]).

There is an article TEES in SPOJ, and it also described this algorithm. But the content is cleared due to unknown reason. And my following test code is based on this version.

Complexity

[2] 2.3 and [6] said that this black algorithm has an amazing performance. [6] said, the number of $t$ is $O(n^{1-\epsilon})$. Here is the code to compute the number of $t$:

#include <pe.hpp>
using namespace pe;

// Unused code. Just demostrate how to implement this algorithm.
int64 dfs(int limit, int64 n, int64 val, int imp, int64 vmp, int emp) {
  int64 ret = 1;
  for (int i = 0; i < limit; ++i) {
    const int64 p = plist[i];
    const int nextimp = imp == -1 ? i : imp;
    const int64 nextvmp = imp == -1 ? p : vmp;
    const int64 valLimit = n / p / nextvmp;
    if (val > valLimit) break;
    int e = 1;
    for (int64 nextval = val * p;; ++e) {
      ret += dfs(i, n, nextval, nextimp, nextvmp, imp == -1 ? e : emp);
      if (nextval > valLimit) break;
      nextval *= p;
    }
  }
  return ret;
}

using RT = int64;
struct Solver : public MValueBaseEx<Solver, int64, 8> {
  RT Batch(int64 n, int64 val, int imp, int64 vmp, int emp, RT now, RT now1) {
    return 1;
  }
};

int main() {
  PE_INIT(maxp = 100000000);
  for (int i = 5; i <= 16; ++i) {
    TimeRecorder tr;
    int64 cnt = Solver().Cal(Power(10LL, i));
    printf("1e%d\t%.2e\t%16I64d\t%s\n", i, 1. * cnt, cnt,
           tr.Elapsed().Format().c_str());
  }
  return 0;
}

The outputs are

1e5     1.89e+03                    1894        0:00:00:00.000
1e6     9.11e+03                    9108        0:00:00:00.000
1e7     4.49e+04                   44948        0:00:00:00.000
1e8     2.28e+05                  228102        0:00:00:00.000
1e9     1.19e+06                 1185818        0:00:00:00.001
1e10    6.30e+06                 6298637        0:00:00:00.006
1e11    3.41e+07                34113193        0:00:00:00.036
1e12    1.88e+08               188014195        0:00:00:00.222
1e13    1.05e+09              1052806860        0:00:00:01.194
1e14    5.98e+09              5981038282        0:00:00:06.885
1e15    3.44e+10             34430179518        0:00:00:39.958
1e16    2.01e+11            200620098564        0:00:03:52.337
time usage: 0:00:04:41.111

It means, if the complexity of $h$ is ignored, you can use this method to brute force an input of $10^{16}$ by multi-threads in several minutes.

Build a code template and optimizate it

In pe_algo

Optimize $h$ part

Let $h(n)$ be the number of primes no more than $n$, in pe_algo

Conclusion

Why do I call it the black algorithm?

A variant

The existing algorithm is able to compute $\sum_{i=1}^n f(i)$ assuming $\sum_{p \le \frac{n}{j}} f(p) [p \text{ is prime}]$ is known. A modified version is able to compute $\sum_{i \le \frac{n}{j}} f(i)$. The idea is, for each \(\text{class}_{t} = \{ x = t * p \ \vert \ p \text{ is prime and } p \ge \text{max prime factor}(t) \}\), we divide the $p$ into different segments according to the values of $\frac{n}{t p}$.

#include <pe.hpp>
using namespace pe;

struct Solver : public MValueBaseEx<Solver, int64, 8> {
  int64 Batch(int64 n, int64 val, int /*imp*/, int64 vmp, int /*emp*/,
              int64 /*now*/, int64 /*now1*/) {
    int64 t = 1;
    const int64 m = n / val;
    for (int64 i = vmp + 1; i <= m;) {
      int64 v = m / i;
      int64 maxi = m / v;
      ++t;
      // handle val * p where i <= p <= maxi
      i = maxi + 1;
    }
    // handle val * vmp * vmp if vmp > 1
    return t;
  }
  int64 F(int64 /*p*/, int /*e*/) { return 1; }
};

int main() {
  PE_INIT(maxp=100000000);
  for (int i = 5; i <= 12; ++i) {
    TimeRecorder tr;
    int64 cnt = Solver().Cal(Power(10LL, i));
    printf("1e%d\t%.2e\t%16I64d\t%s\n", i, 1. * cnt, cnt,
           tr.Elapsed().Format().c_str());
  }
  return 0;
}

the outputs are

1e5     2.15e+04                   21454        0:00:00:00.000
1e6     1.25e+05                  125220        0:00:00:00.000
1e7     7.28e+05                  727870        0:00:00:00.001
1e8     4.25e+06                 4246101        0:00:00:00.010
1e9     2.49e+07                24926095        0:00:00:00.059
1e10    1.47e+08               147470529        0:00:00:00.354
1e11    8.80e+08               879838265        0:00:00:02.136
1e12    5.29e+09              5294311815        0:00:00:12.784
time usage: 0:00:00:15.813

References

  1. Zhizhou Ren, 2016, Some methods to compute the prefix-sum of multiplicative function (chinese content), page 1 to page 16
  2. Zhengting Zhu, 2018, Some special arithmetic function summation problems (chinese content), page 92 to page 112
  3. Min_25 sieve (chinese content)
  4. Min_25, 2018.11.11 Sum of Multiplicative Function
  5. dengtesla, 2019, Detail explanning of the new Min_25 sieve $O(n^{\frac{2}{3}})$ (chinese content)
  6. Bohang Zhang, 2018, A proof for a general algorithm of summing multiplicative function (chinese content)