The Multiple Polynomial Quadratic Sieve

前言

笔者对《The Multiple Polynomial Quadratic Sieve》一文进行了翻译，后文大部分内容主要以原文内容为主，对于部分作者讲述有跳跃性的内容，笔者对顺序进行了修改以及补充。每章节的内容首先会给出原文，紧接着会在此基础上进行翻译。关于文章内容，如有不当之处欢迎批评指正。

摘要

Abstract. A modification, due to Peter Montgomery, of Pomerance's Quadratic Sieve for factoring large integers is discussed along with its implementation. Using it, allows factorization with over an order of magnitude less sieving than the basic algorithm. It enables one to factor numbers in the 60-digit range in about a day, using a large minicomputer. The algorithm has features which make it well adapted to parallel implementation.

本文讨论了 Peter Montgomery 对 Pomerance 的二次筛选法所做的改进，这是一种用于分解大整数的算法，以及其实现方法。使用这种改进的算法，可以在比基本算法少一个数量级的筛选次数下完成因数分解。利用这种方法，可以在大约一天的时间内，使用一台大型小型计算机分解60位数字的整数。该算法具有一些特性，使其非常适合并行实现。

介绍

Introduction. The basic quadratic sieve algorithm has origins which date back to M. Kraitchik, but was first explicitly stated and analyzed by C. Pomerance. The only two reported implementations of this algorithm were done by J. Gerver at Rutgers and J. Davis and D. Holdridge at Sandia National Laboratories. The latter used a Cray-1 to factor some numbers in the 60-70 digit range from the 'Cunningham Project'. They also used a Cray XMP to factor the then 'MOST WANTED' number in 9.5 hours. While some of the arguments that follow have been detailed by Pomerance elsewhere, we present ours in a manner which is more oriented towards implementation of the algorithm. The basic algorithm depends on constructing a solution to the following equation, where is the number you wish to factor If and , then and are proper factors of . This version of the quadratic sieve generates a set of quadratic residues of using the following single polynomial: It follows immediately that if a prime , then for all . Thus, the values of the polynomial may be factored with a sieve, once one solves . This may be solved by any one of a number of available algorithms. The potential divisors of are exactly those primes for which the Legendre symbol and the unit -1 is needed to hold the sign.

基本的二次筛选算法起源于 M. Kraitchik，但首次由 C. Pomerance 明确提出并进行分析。这个算法仅有的两次报告实现是由 J. Gerver 在 Rutgers大学和 J. Davis 与 D. Holdridge 在Sandia National Laboratories 完成的。后者使用 Cray-1超级计算机在 Cunningham Project 中分解了一些 60-70 位数字范围的数。他们还使用 Cray XMP超级计算机分解了当时被称为 'MOST WANTED' 的数，耗时9.5小时。尽管 Pomerance 在其他地方已经详细描述了以下一些论证，我们在这里以更倾向于算法实现的方式呈现。基本算法依赖于构建以下方程的解，其中是你希望分解的数。

如果且，那么和是的真因子。

一般情况下我们有，所以。下面我们期望找到一个是完全平方数。这样我们就可以得到即但通常想要得到是完全平方是困难的，故我们试图找到若干光滑的，使得对于每个我们记对于给定筛区，一种方法是我们遍历筛区中所有的点，通过试除法判断每个是不是光滑的，但往往这种方法的时间复杂度是较高的；

另一种方式是对于，我们直接在筛区内找出整除这些素数的位置，最终那些被完全整除的就是需要的数。可以得出，如果一个素数整除，那么对于所有的，也整除。

因此，一旦解出了，我们就可以找到筛区中所有整除的。

的可能的除数正是那些使得勒让德符号成立的素数（后续会对此做出解释）

在区间上，我们注意到的值为负数，故我们需要单位来保持符号。这里不能简单的将这个负值转化为一个模下同余的正值，因为这里是接近的，也就是说其几乎不可能是一个光滑的数。

The algorithm now proceeds as follows:

Select a factor base $¡$ ，for some appropriate value of , and for the sign.

Solve the quadratic equation , for all . There will be two roots rx and r2 for each p

Initialize a sieve array to zero over the interval for some appropriate .

For all add the value of to the sieve array at locations and

The value of will be approximately over , so compare each sieve location with .Fully factored residues will have their corresponding sieve value close to this value. For these, construct the exact factorization via division. Factorizations are found so infrequently that the time to do this division is negligible. One need not check all the primes in the factor base in doing this division. If is the location in the sieve array, one need only compute . Only if equals one of the two roots do we go ahead and do the multi-precise division； Let be the corresponding vector of exponents with .

Collect factorizations. One then finds a set of residues whose product is a square via Gaussian elimination over on the matrix formed by reducing all of the . This creates a linear dependency on the exponents, and the product of the vectors in that dependency forms a square. It is then trivial to construct an instance of congruence.

算法现在按以下步骤进行：

选择一个因子基 $是素数$ ，对于某个适当的值，并且作为符号。
解决二次方程，对于所有。对于每个，将有两个根和。
初始化一个 sieve array，将其设置为零，覆盖区间，对于某个适当的。
对于所有，在 sieve array 的位置和上增加的值。
的值在区间上大约是，因此将 sieve array 中每个位置的值与进行比较。能完全分解的数对应的sieve array 中的位置上的值与相近。我们将筛选出的这些树称为候选数。
对于这些数，通过除法构建确切的分解。由于相对光滑的数是极少的，以至于进行这种除法的时间可以忽略不计。对于因子基中的所有素数，我们不一定采取上述除法（指）。一般而言只需要计算。只有当等于或者时，我们才继续进行高精度除法。由于的素因子数相对因子基元素的个数小，所以我们几乎都只需计算，相比于优化前的方法，我们看到被除数的规模从缩小到了
我们通过上述过程得到个分解。然后在矩阵上进行高斯消元法在上找到一个线性组合，使得指数向量每个分量上都是偶数(也就找到了对应的的乘积为平方) 。这个矩阵是由将所有的得到的。

~~初学者可以先大致了解算法流程，关于算法的细节会在后文中进行描述。~~

The chief difficulty with this approach is that one must obtain approximately as many fully factored residues as there are primes in the factor base. In order to obtain enough factorizations, M must be very large, and the residues grow linearly in size with .

A way around this problem was formally suggested by Peter Montgomery: Simply use many polynomials to generate the residues and sieve each polynomial over a much smaller interval. Utilizing multiple polynomials enables one to keep the sieve interval small, and hence makes the residues easier to factor. We have implemented this approach and found that it is quite effective. It allows one to find enough factored residues using less than one-tenth the total sieve length in the single polynomial version. We also show that the cost of changing polynomials is small.

In fact, the implementation of Sandia was done in two stages. Their first version used a single polynomial. A later version used multiple polynomials, but in a disguised rather than explicit form. This latter version created implicit polynomials from subsequences of the main sieve which were divisible by a large prime q lying outside the factor base. They called their method "special q". Montgomery's suggested polynomials are somewhat better, however, because they yield residues which are smaller on average and are less clumsy to implement.

In Section 2 we shall discuss how to select the coefficients of the polynomials. Section 3 will show how to compute those coefficients efficiently. Section 4 will discuss the basic steps of the algorithm. Section 5 will discuss selection of the algorithm's input parameters. Section 6 will present some numerical results. In Section 7 we discuss its parallel implementation and finally, in Section 8, we compare our algorithm with other methods.

这种方法的主要困难在于必须获得与因子基中的素数数量大致相同的光滑数。为了获得足够的分解，必须非常大，并且操作数大小与呈线性增长。

Peter Montgomery 正式提出了一个解决这个问题的方法：简单地使用多个多项式来生成，并在更小的区间内对每个多项式进行筛选。使用多个多项式可以保持筛选区间较小，从而使更容易分解。我们已经实现了这种方法，并发现它非常有效。它允许我们使用不到单多项式版本总筛选长度的十分之一来找到足够的分解。我们还展示了更换多项式的成本很小。

读者需要关注的是，MPQS 相比 SPQS，总筛区长度仅为；并且其更换多项式作为额外开销，需要的时间也是可以忽略不计的。

实际上，Sandia 的实现分为两个阶段。他们的第一版使用了单个多项式。后来的版本使用了多个多项式，但是以一种隐式而非显式的形式。这个后来的版本从主筛选的子序列中创建了隐式多项式，这些子序列可以被因子基之外的大素数整除。他们称他们的方法为 special q。然而，Montgomery 建议的多项式更好一些，因为它们产生的平均来说更小，而且在实现上更简洁。

系数选择

Selection of Coefficients. Select . In order to make generate quadratic residues, it is required that . Since this latter expression is congruent to , it means that if one must premultiply it by a small constant k, so . This is also a good thing to do, in general, because it usually allows one to find a factor base which is much richer in small primes. We will later discuss a function that may be used to evaluate a multiplier. A suggestion by Pomerance allows one to do away with requiring : Simply take as the middle coefficient rather than . However, we have not yet implemented this suggestion. It would be advantageous to keep the value of small, in some appropriate sense, over . There are several obvious ways of doing this, e.g.

Minimize over

Minimize

Minimize

subject to and .

It is easily seen that and are essentially equivalent. The length of the base of the parabola is and its area will be directly proportional to its height. Relaxing the integer constraints and solving each of the above Lagrange multiplier problems, one finds that they all yield essentially the same result. The exact answer for is where The only difference among the results of the different minimization problems is that the constants and change very slightly. ranges from to and ranges from to with . The maximum value of over is , a factor of improvement over (2) and the 'special q' polynomials at Sandia. follows immediately from symmetry considerations, but and give similar results for sind because the constraint is very binding on the shape of . The simplest way to see this is to realize that at the roots its slope will be ± fkÑ. We would really like to flatten the parabola, but the constraint on the discriminant means that the curve must have a certain 'steepness'. Thus, we can do little more than translate the parabola up and down.

下面介绍的系数选择。为了使生成二次剩余，一个可行的充分条件是满足。

由于这个表达式左侧模余，我们仅考虑为奇数的情形。这意味着如果，就必须使得。一般来说，这样做也是有好处的，因为它通常允许找到一个富含小素数的因子基。我们稍后将讨论一个可以用来评估乘数的函数。Pomerance的一个建议允许我们不必要求：简单地取作为中间系数而不是。然而，我们还没有实施这个建议。在某种适当的意义上，保持在区间上的值较小是有利的。受条件和的限制，有几种明显的方法可以做到这一点，例如：

在区间上最小化
最小化
最小化

在整数上讨论上面问题是复杂的，我们不妨放宽条件，仅限制。我们用拉格朗日乘数解决上面问题，可以发现它们都产生本质上相同的结果。最终只有上存在差异，其中的范围从到，的范围从到，且。在的最大值为

对于拉格朗日乘数的方法，可以简单的理解为用解析的方法求限制条件下函数的极值。具体而言，我们记的系数为一个三元组 这里笔者并没有进行验证计算，而是直接使用了原文作者的结论。

由于对称性考虑，可以立即得出。而情形1 和情形3 产生了类似的结果，是因为因为约束条件对的形状非常严格。最简单的一个特征是在零点处的斜率都是。我们希望压平抛物线，但对判别式的约束意味着曲线必须具有一定的陡峭度。因此，我们除了上下平移抛物线之外，几乎无法做更多的事情。

计算系数

3. Computation of Coefficients. A simple method for selecting , , and comes from a method for finding modular square roots quickly. To satisfy one must have Let and . It is desirable that be prime because if a prime in the factor base divides , then has only one root and the probability that over drops from to . It is sufficient for practical purposes that be only a probable prime. Alternatively, one may select to be the product of primes not in the factor base, but its factorization must be known to solve . Our implementation took to be a probable prime. To find the coefficients, compute

Since ，Then
Let One now has
and
Since B must be odd, if it is even subtract it from A.

The value of is easily obtained since has already been computed. One also has
It is not necessary, in practice, to actually compute since the value of is not really needed. It may be used, however, as a check on the other computations. Compute and save the value of for later. This will enable us to quickly compute when we find a factorization. As a result of the way was chosen, one also has
Some care must be taken: If x lies between the real roots of , then is negative, and one must subtract its value from .

The cost of finding the coefficients is dominated by the probable prime and residue tests on , the computation of and and . Nevertheless, the total arithmetic to be done is small.

Finally, the roots of , are
since is invariant.

Most of the cost of changing polynomials occurs in computing . The cost of computing is dominated by the computation of , which must be done for all primes in the factor base. Even with an efficient algorithm for doing this, such as the extended Euclidean algorithm, one must typically do this thousands of times when changing polynomials. On a SUN-3/75, for a 60-digit number with a factor base of primes, it takes seconds to compute the coefficients and seconds to compute all the roots.

在这部分内容中，原文作者是从算法实现的角度，直接给出了部分结果的值，故不容易理解。笔者这里按照正常的数学推理过程进行阐述，当然，最终结论保持不变。

选择和的一个简单方法来自于快速寻找模平方根的方法。为了满足，必须有设以及。

这里笔者根据自己理解对上述选取参数规则进行说明：

对于选取为一个完全平方数：重新考虑，我们对其进行配方，我们的最终目的是把上述式子左侧凑成平方数。如果是一个完全平方数，此时我们就只需要关注即可。
保证了后续过程中，关于同余方程的解是存在的
，意味着我们可以很快通过 Cipolla 算法解出同余方程的解
是我们根据上面 系数选择 中的内容对的取值进行一个估计

这里最好令是素数，因为如果因子基中的一个素数整除，那么只有一个根（的二次项系数整除，导致在模下有，退化为一次同余方程），那么在上的概率从下降到。具体而言，当方程有两个解时，在区间中任意与两根在模下同余的位置都是满足条件的，故概率是。

这里不需要是一个真素数。或者，可以选择作为不在因子基中的素数的乘积，但其分解必须已知以解决。我们的实现取为素数的情况（能通过一般的素数检测就行）。

后续部分主要是关于如何求解二次同余方程的问题。这里原文主要从算法的逻辑进行分析，这里我们从需要解决的问题入手，结合初始已知条件逐步分析。

对于，结合已知条件我们写成下面形式其中是一个常数，为素数。

一般来说求解上面同余方程我们先求出下的解，然后用 Hensel引理 提升到，最后到

mod D

下面我们先计算这里作者选取的特殊的，我们可以直接得到具体而言对于素数，当时，根据费马小定理那么此时即我们得到的结果，事实上上述过程即 Cipolla 算法。

Hensel 引理

设为整系数多项式，为不少于的整数，为质数。若整数是下面同余式的根：对于则有

若，则存在唯一的整数使得成立。这里
若且，则对任意整数t成立。
若但，则无整数解。

证明

这里不妨设由泰勒公式我们有因此可见，由第三项开始，都必能被整除。因此只需满足即易证得结论

mod A

这里作者虽然开始给出的同余方程是在模下，但在求出模下的解后就不再提升，这里笔者会在后续过程中给出原因。

首先根据 3.1 我们求出这里和原文一样，我们记

令，，根据 Hensel 引理 即我们可以求出模下的值。这里注意由于是一个奇数，如果求出来的为偶数，我们取

为了方便，我们记此时的值为，由于，则也就是说此时恰好也是模下的一组解，故至此我们不再提升，而是直接算出。这里这里原文还提及一个变量，事实上是对计算的一个去冗余的优化过程。我们看到直接按照的表达式求我们需要计算一次快速幂，一次模逆元，时间复杂度可以近似估计为；但如果先计算，计算只需要一次模乘法，由于，计算也只需要做一次模乘法。那么总的时间复杂度为。

下面有两种方法计算：

第一种是原文作者提及的方法。根据条件我们可以得到也就是说我们可以不去计算。或者计算它可以用作其他计算的检查。这里我们计算并保存的值以备后用。这将使我们能够快速计算当我们找到一个分解。

第二种是笔者的方法，我们仍然可以直接使用，进行计算。当然，这里需要使用秦九昭算法进行计算（），这样与方法一相比，方法1需要一个内存空间、三次模乘法、一次加法；笔者的方法需要进行两次模乘法、两次加法。

~~必须小心：如果位于的实根之间，则是负数，必须从中减去它的值。~~这里对于 python 所有操作都直接模即可，因为 python 模运算结果的符号取决于除数。当然如果是基于C/C++的快速实现，由于其符号取决于被除数，就可能会出现正负值，我们就需要注意符号的调整。

最后，我们还需要预处理所有模数下的根。即由于因子基是不变的，且是定值，我们

可以预处理计算。故更换多项式时大部分成本主要由计算所主导，这是因为我们需要对因子基中的所有素数进行求逆元过程。即使使用高效的算法，如扩展欧几里得算法，更换多项式时通常需要这样做数千次。在 SUN-3/75 上，对于一个位数字，因子基有个素数，计算系数需要秒，计算所有根需要秒。