Product of univariate polynomials

Shoup [18] showed that the product of univariate polynomials with large degrees and large coefficients can be computed efficiently by FFT over small finite fields and Chinese remainder theorem. It can be easily parallelized:


Input :$f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$

Output : $f = f_1f_2$
$P \leftarrow$ $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime,
$2^{M+1}\vert m_i-1$ and $m=\prod m_i $ is sufficiently large.
Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.
for $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$
Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$
and $F_j \equiv 0 \bmod m/M_j$ in parallel.
(The product is computed by FFT.)
return $\phi_m(\sum F_j)$
(For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)

Figure 3 shows the speedup factor under the above distributed computation on Risa/Asir. For each $n$, two polynomials of degree $n$ with 3000bit coefficients are generated and the product is computed. The machine is FUJITSU AP3000, a cluster of Sun workstations connected with a high speed network and MPI over the network is used to implement OpenXM.

Figure 3: Speedup factor
\begin{figure}
\epsfxsize =8.5cm
\epsffile{speedup.ps}\end{figure}

If the number of servers is $L$ and the inputs are fixed, then the cost to compute $F_j$ in parallel is $O(1/L)$, whereas the cost to send and receive polynomials is $O(L)$ if ox_push_cmo() and ox_pop_cmo() are repeatedly applied on the client. Therefore the speedup is limited and the upper bound of the speedup factor depends on the ratio of the computational cost and the communication cost for each unit operation. Figure 3 shows that the speedup is satisfactory if the degree is large and $L$ is not large, say, up to 10 under the above environment. If OpenXM provides collective operations for broadcast and reduction such as MPI_Bcast and MPI_Reduce respectively, the cost of sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(\log_2L)$ and we can expect better results in such a case. In order to implement such operations we need new specifications for inter-sever communication and the session management, which will be proposed as OpenXM-RFC 102. We note that preliminary experiments show the collective operations work well on OpenXM.

Nobuki Takayama 2017-03-30