JP4868408B2

JP4868408B2 - Arithmetic unit, reciprocal approximate calculation program, and approximate calculation method

Info

Publication number: JP4868408B2
Application number: JP2007132640A
Authority: JP
Inventors: 祐一郎蓬来; 一彦福井
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2007-05-18
Filing date: 2007-05-18
Publication date: 2012-02-01
Anticipated expiration: 2027-05-18
Also published as: JP2008287555A

Description

本発明は、逆数または平方根逆数の近似計算をニュートンラフソン法によって行う演算装置に関し、特に、近似計算にて同程度の大きさの浮動小数点数の減算処理に伴ってプロセッサで生じる長いビットシフト処理に起因するストールを回避できる演算装置に関する。 The present invention relates to an arithmetic unit that performs approximate calculation of reciprocal or square root reciprocal using the Newton-Raphson method, and in particular, to long bit shift processing that occurs in a processor accompanying subtraction processing of floating-point numbers of the same size in approximate calculation. The present invention relates to an arithmetic device capable of avoiding a stall caused by the problem.

逆数および平方根逆数（平方根の逆数、以下同じ）を高精度で計算する機能を持った演算装置は、各種の分野で有用に用いられている。例えば、分子生物学の分野では、クーロンポテンシャル（エネルギ）およびクーロン力を計算するのに、逆数および平方根逆数が用いられる。そして、膨大な情報を処理するために高い演算速度が求められる。 Arithmetic devices having a function of calculating reciprocal numbers and reciprocal square roots (reciprocals of square roots, hereinafter the same) with high accuracy are usefully used in various fields. For example, in the field of molecular biology, reciprocals and reciprocal square roots are used to calculate Coulomb potential (energy) and Coulomb force. In addition, a high calculation speed is required to process a huge amount of information.

従来より、逆数および平方根逆数を計算するために、ニュートンラフソン法が一般に利用されている（例えば特許文献１参照）。以下、ニュートンラフソン法の原理について説明する。 Conventionally, the Newton-Raphson method is generally used to calculate the reciprocal and the square root reciprocal (see, for example, Patent Document 1). Hereinafter, the principle of the Newton-Raphson method will be described.

ニュートンラフソン法は、ｆ（ｘ）＝０を満たすｘを求める手法であり、下の式（１）による近似計算が行われる。
ｘ_ｉ＋１＝ｘ_ｉ−ｆ（ｘ_ｉ）／ｆ’（ｘ_ｉ）・・・式（１）
ｘに十分近いｘ_ｉに対して式（１）の計算を１ステップ行うと、精度が倍になる。そこで、目標精度に到達するまで式（１）の計算が繰り返される。 The Newton-Raphson method is a method for obtaining x satisfying f (x) = 0, and approximate calculation is performed by the following equation (1).
x _{i + 1} = x _i −f (x _i ) / f ′ (x _i ) (1)
If the calculation of Equation (1) is performed for one step for x _i sufficiently close to x, the accuracy is doubled. Therefore, the calculation of Expression (1) is repeated until the target accuracy is reached.

次に、ニュートンラフソン法によって逆数および平方根逆数を計算する方法を説明する。
「逆数」
数値ａの逆数１／ａを求める場合、ｆ（ｘ）＝ａ−１／ｘとする。
ｆ’（ｘ）＝１／ｘ^２
ｘ_ｉ＋１＝ｘ_ｉ−（ａ−１／ｘ_ｉ）／（１／ｘ_ｉ ^２）
ｘ_ｉ＋１＝ｘ_ｉ×（２−ａ×ｘ_ｉ）・・・式（２） Next, a method for calculating the reciprocal and the square root reciprocal by the Newton-Raphson method will be described.
"Reciprocal"
When obtaining the reciprocal 1 / a of the numerical value a, it is assumed that f (x) = a−1 / x.
f ′ (x) = 1 / x ²
x _{i + 1} = x _i − (a−1 / x _i ) / (1 / x _i ² )
x _{i + 1} = x _i × (2−a × x _i ) (2)

式（２）を繰り返すことにより目標精度の逆数が計算される。ここで、式（２）の計算は、ニュートンラフソン法の１ステップに相当する。そこで、式（２）の計算を「１段近似計算」と呼ぶ。 The reciprocal of the target accuracy is calculated by repeating equation (2). Here, the calculation of Expression (2) corresponds to one step of the Newton-Raphson method. Therefore, the calculation of Expression (2) is referred to as “one-stage approximation calculation”.

ニュートンラフソン法は以下のように展開される。ε_ｉ＝１−ａ×ｘ_ｉとおくと、ニュートンラフソン法の２ステップが、式（３）で示すように、差分のないかたちで表される。
ｘ_ｉ＋２＝ｘ_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ）））・・・式（３） Newton-Raphson method is developed as follows. Assuming that ε _i = 1−a × x _i , the two steps of the Newton-Raphson method are expressed in the form of no difference as shown in the equation (3).
x _{i + 2} = x _i × (1 + ε _i × (1 + ε _i × (1 + ε _i ))) (3)

式（３）は、ニュートンラフソン法の２ステップに相当するので、以下、「多段近似計算」と呼ぶ。式（３）をさらに変形し、（１＋ε_ｉ）の項を同じかたちで挿入すると、ニュートンラフソン法のステップ数を増大するのと同じ効果が得られる。このような計算も、「多段近似計算」に含まれる。 Since equation (3) corresponds to two steps of the Newton-Raphson method, it is hereinafter referred to as “multistage approximation calculation”. If the expression (3) is further modified and the term (1 + ε _i ) is inserted in the same way, the same effect as increasing the number of steps of the Newton-Raphson method can be obtained. Such a calculation is also included in the “multistage approximation calculation”.

「平方根逆数」
数値ａの平方根逆数１／ａ^１／２を求める場合、ｆ（ｘ）＝ａ−１／ｘ^２とする。
ｆ’（ｘ）＝２／ｘ^３
ｘ_ｉ＋１＝（３×ｘ_ｉ−ａ×ｘ_ｉ ^３）／２・・・式（４） "Reciprocal square root"
When obtaining the inverse square root 1 / a ^1/2 of the numerical value a, it is assumed that f (x) = a−1 / x ² .
f ′ (x) = 2 / x ³
x _{i + 1} = (3 × x _i −a × x _i ³ ) / 2 Equation (4)

式（４）の計算がニュートンラフソン法の１ステップであり、すなわち１段近似計算である。次に、ε_ｉ＝１−ａ×ｘ^２とおくと、ニュートンラフソン法の２ステップが、式（５）で表される。式（５）が多段近似計算に相当する。
ｘ_ｉ＋２＝ｘ_ｉ（１＋（1/2）ε_ｉ＋（3/8）ε_ｉ ^２＋（5/16）ε_ｉ ^３＋（1/16）ε_ｉ ^４）・・・式（５）
特開平１０−５５２６６号公報 The calculation of Equation (4) is one step of the Newton-Raphson method, that is, a one-stage approximation calculation. Next, when ε _i = 1−a × x ² , two steps of the Newton-Raphson method are expressed by Expression (5). Equation (5) corresponds to multistage approximate calculation.
x _{i + 2} = x _i (1+ (1/2) ε _i + (3/8) ε _i ² + (5/16) ε _i ³ + (1/16) ε _i ⁴ ) (5)
JP-A-10-55266

従来の演算装置では、逆数および平方根逆数が上述したニュートンラフソン法の原理に従って計算される。ニュートンラフソン法は、理論的には、逆数および平方根逆数を高精度に計算できる好適な手法である。しかし、実際のプロセッサでの計算にニュートンラフソン法を単純に適用すると、以下に説明するように、プロセッサでの計算時間が長大になる場合がある。 In the conventional arithmetic unit, the reciprocal and the square root reciprocal are calculated according to the principle of the Newton-Raphson method described above. The Newton-Raphson method is theoretically a suitable method that can calculate the reciprocal and the square root reciprocal with high accuracy. However, if the Newton-Raphson method is simply applied to the calculation in the actual processor, the calculation time in the processor may become long as will be described below.

プロセッサでは周知のように数値が浮動小数点数で表現される。プロセッサとしては、例えば、ＩＢＭ（登録商標、以下、同じ）のＰＯＷＥＲ４およびＰＯＷＥＲ５などが挙げられる。また、浮動小数点は例えばＩＥＥＥ７５４に規定されている。図１１は、ＩＥＥＥ７５４の倍精度の浮動小数点数を表しており、この場合、数値が、符号、指数部および仮数部で構成される。「仮数部」は５２ビットの値であり、仮数「１.ｘ_ｉｘ_２ｘ_３・・・ｘ_５２」のうちの小数点以下の部分のみを表している。 As is well known in the processor, numerical values are represented by floating point numbers. Examples of the processor include POWER4 and POWER5 of IBM (registered trademark, hereinafter the same). The floating point is defined in, for example, IEEE754. FIG. 11 shows an IEEE754 double-precision floating point number. In this case, the numerical value is composed of a sign, an exponent part, and a mantissa part. The “mantissa part” is a 52-bit value, and represents only the fractional part of the mantissa “1.x _i x ₂ x ₃ ... X ₅₂ ”.

ニュートンラフソン法にて逆数を計算する場合に、精度よく計算しようとすると、同程度の大きさの２つの数値の差分を計算する必要が生じてくる。しかしながら、浮動小数点数を扱うプロセッサにて同程度の２つの数値の差分を計算すると、プロセッサ内部で長いビットシフトが行われることになる。 When calculating the reciprocal by the Newton-Raphson method, it is necessary to calculate the difference between two numerical values of the same size if trying to calculate with high accuracy. However, if a difference between two numerical values is calculated by a processor that handles floating-point numbers, a long bit shift is performed inside the processor.

例として、図１２に示すように、同程度の２つの数値が引き算され、両数値が図示の仮数部を有していたとする。この場合、仮数部が近いので、仮数部の差を見ると、前側に値０のビットが並ぶ。ＩＥＥＥ７５４の正規化数の表現では、図１１に示したように、仮数は、「１．ｘ_ｉｘ_２ｘ_３・・・ｘ_５２」と規定されており、すなわち、最初の桁は「１．」に固定されており、その後に小数点以下の５２ｂｉｔの値が続くように規定されている。そのため、仮数部の差が図１２のようになる場合、最初の「１」が現れるまで、すなわち、差分にて値１を持つ先頭のビットが仮数の前端に達するまで、ビットシフトを行う必要がある。２つの数値が近いと、仮数部の差分にならぶ値０のビットが増え、そのためにビットシフト量も大きくなる。 As an example, as shown in FIG. 12, it is assumed that two numerical values of the same degree are subtracted and both numerical values have the illustrated mantissa part. In this case, since the mantissa part is close, when looking at the difference between the mantissa part, bits of value 0 are arranged on the front side. In the expression of the normalized number of IEEE754, as shown in FIG. 11, the mantissa is defined as “1.x _i x ₂ x ₃ ... X ₅₂ ”, that is, the first digit is “1. ", And a value of 52 bits after the decimal point follows. Therefore, when the difference between the mantissa parts is as shown in FIG. 12, it is necessary to perform bit shift until the first “1” appears, that is, until the leading bit having the value 1 in the difference reaches the front end of the mantissa. is there. When the two numerical values are close, the number of bits having a value 0 that follows the difference of the mantissa part increases, and the bit shift amount also increases.

このように、ニュートンラフソン法にて逆数を高精度に計算しようとすると、同程度の大きさの数値の差分を計算することが必要になり、その結果、プロセッサ内部で長いビットシフトが行われてしまう。そして、長いビットシフトは、以降の浮動小数点演算命令の実行をストールさせてしまい、これにより計算速度が低下してしまう。 In this way, when trying to calculate the reciprocal with high accuracy by the Newton-Raphson method, it is necessary to calculate the difference between numerical values of the same magnitude, and as a result, a long bit shift is performed inside the processor. End up. A long bit shift stalls the execution of subsequent floating point arithmetic instructions, thereby reducing the calculation speed.

ここでは、逆数を求める場合に関して従来の問題を説明したが、平方根逆数を求める場合にも同様に計算速度の向上が望まれる。 Here, the conventional problem has been described with respect to the case of obtaining the reciprocal number, but the improvement of the calculation speed is also desired in the case of obtaining the reciprocal square root.

本発明は上記背景の下でなされたものであり、その目的は、逆数または平方根逆数の近似計算をニュートンラフソン法によって行う際、同程度の大きさの浮動小数点数の減算処理に伴ってプロセッサで生じるビットシフト処理に起因するストールを回避でき、計算速度を向上できる演算装置を提供することにある。 The present invention has been made under the above background, and the purpose of the present invention is to use a processor along with subtraction processing of floating-point numbers of the same size when performing approximate calculation of reciprocal or reciprocal square root by the Newton-Raphson method. An object of the present invention is to provide an arithmetic device capable of avoiding a stall caused by the generated bit shift processing and improving the calculation speed.

本発明の演算装置は、逆数を求めるべき入力値に対応し所定の初期近似精度を有する逆数である初期近似逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標逆数を算出する演算装置であって、浮動小数点数を処理する演算機能を有するプロセッサと、逆数の近似計算プログラムを記憶する記憶部とを有し、前記記憶部に記憶された前記近似計算プログラムを前記プロセッサにて実行することによって中間値演算部および目標値演算部が前記演算装置に備えられ、前記中間値演算部は、前記初期近似逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似逆数を算出する中間値演算処理を行い、前記目標値演算部は、前記中間近似逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標逆数を算出する目標値演算処理を行い、前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算部は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似逆数を算出する。 The arithmetic device of the present invention performs an approximate calculation by the Newton-Raphson method for an initial approximate inverse that corresponds to an input value for which an inverse is to be obtained and has a predetermined initial approximate accuracy, and obtains a preset target accuracy. An arithmetic device for calculating a target reciprocal having a processor having a calculation function for processing a floating-point number and a storage unit for storing an approximate calculation program of the reciprocal, and storing the approximate calculation stored in the storage unit An intermediate value calculation unit and a target value calculation unit are provided in the calculation device by executing a program on the processor, and the intermediate value calculation unit is configured to perform one-stage approximate calculation by Newton-Raphson method for the initial approximate inverse. To calculate an intermediate approximate inverse having a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy. An arithmetic processing is performed, and the target value calculation unit calculates the target reciprocal by performing a multi-stage approximate calculation by the Newton-Raphson method so that the accuracy from the intermediate approximate accuracy to the target accuracy is increased with respect to the intermediate approximate reciprocal The intermediate approximation accuracy is set to an accuracy in which a non-stall error corresponding to the upper limit bit shift amount that does not cause a stall due to a bit shift of the mantissa part due to the subtraction process of the floating point number in the processor remains. The intermediate value calculation unit limits the approximation accuracy so as not to exceed the intermediate approximation accuracy by performing an error expansion one-stage approximation calculation in which the error is expanded so that the non-stall error remains in the approximation calculation result. The intermediate approximate reciprocal is calculated.

上記のように、本発明によれば、中間近似精度を持つ中間近似逆数が誤差拡大１段近似計算により算出され、それから、目標精度を持つ目標逆数が中間近似逆数から多段近似計算により算出される。中間近似逆数では近似精度が中間近似精度に制限されており、プロセッサの浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残っている。したがって、差が小さすぎる２つの値の減算処理を行うことなく、近似計算を行える。これにより、ストールを回避でき、計算速度を向上できる。 As described above, according to the present invention, an intermediate approximate inverse having intermediate approximate accuracy is calculated by error-enlarged one-stage approximate calculation, and then a target inverse having target accuracy is calculated from the intermediate approximate inverse by multistage approximate calculation. . In the intermediate approximate reciprocal, the approximation accuracy is limited to the intermediate approximation accuracy, and there remains a non-stall error corresponding to the upper limit bit shift amount that does not cause stall due to bit shift of the mantissa part due to the subtraction processing of the floating point number of the processor . Therefore, approximate calculation can be performed without subtracting two values whose difference is too small. Thereby, a stall can be avoided and calculation speed can be improved.

また、中間値演算部は、前記入力値をａとし、近似計算前後の近似逆数をｘ_ｉ、ｘ_ｉ＋１としたときに、前記誤差拡大１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ×（（２＋α）−a×ｘ_ｉ）を求める処理を行い、誤差成分αが前記非ストール誤差に応じて設定されてよい。 Further, the intermediate value calculation unit assumes that the input value is a and the approximate reciprocals before and after the approximate calculation are x _i and x _{i + 1,} and x _{i + 1} = x _i × ((2 + α ) −a × x _i ) is performed, and the error component α may be set according to the non-stall error.

また、前記プロセッサがＩＥＥＥ７５４により規定される浮動小数点数を処理し、前記初期近似精度が８ｂｉｔ精度であり、前記目標近似精度が５３ｂｉｔ精度であり、前記中間近似精度が１４ｂｉｔ精度であってよい。また、前記誤差成分αが、２^−１４であってよい。 The processor may process a floating-point number defined by IEEE754, the initial approximation accuracy may be 8 bit accuracy, the target approximation accuracy may be 53 bit accuracy, and the intermediate approximation accuracy may be 14 bit accuracy. The error component α may be ^2-14 .

また、前記目標値演算部は、前記中間近似逆数をｘ_ｉとし、ε_ｉ＝１−ａ×ｘ_ｉとしたときに、前記多段近似計算として、ｘ_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ））））を求める処理を行ってよい。 Further, the target value computing unit, the intermediate approximation reciprocal and _{x i,} when the _{_{ε i = 1-a × x}} i, as the multi-stage _{approximation, x i × (1 + ε} i × (1 + ε i × ( 1 + ε _i × (1 + ε _i )))) may be performed.

また、本発明の別の態様に係る演算装置は平方根逆数を計算する。この演算装置は、浮動小数点数を処理する演算機能を有するプロセッサと、平方根逆数の近似計算プログラムを記憶する記憶部とを有し、前記記憶部に記憶された前記近似計算プログラムを前記プロセッサにて実行することにより、平方根逆数を求めるべき入力値に対応し所定の初期近似精度を有する平方根逆数である初期近似平方根逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標平方根逆数を算出する演算装置であって、前記初期近似平方根逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似平方根逆数を算出する中間値演算部と、前記中間近似平方根逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標平方根逆数を算出する目標値演算部と、を備え、前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算部は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似平方根逆数を算出する。 An arithmetic device according to another aspect of the present invention calculates a reciprocal square root. The arithmetic device includes a processor having an arithmetic function for processing a floating-point number, and a storage unit that stores an approximate calculation program of a reciprocal square root, and the processor stores the approximate calculation program stored in the storage unit. By executing the approximate calculation by the Newton-Raphson method for the initial approximate square root inverse that corresponds to the input value for which the inverse square root is to be obtained and has a predetermined initial approximate accuracy, the preset target accuracy is set. And calculating a target square root reciprocal having at least one-time one-step approximate calculation using the Newton-Raphson method for the initial approximate square root reciprocal, and setting between the initial approximate accuracy and the target accuracy. An intermediate value calculation unit for calculating an intermediate approximate square root reciprocal having a predetermined intermediate approximate accuracy, and an intermediate approximate square root reciprocal A target value calculation unit that performs multistage approximation calculation by Newton-Raphson method so as to increase the accuracy from the intermediate approximation accuracy to the target accuracy, and calculates the target square root reciprocal, and the intermediate approximation accuracy is The non-stall error corresponding to the upper limit bit shift amount that does not cause stall due to bit shift of the mantissa part due to the floating point number subtraction process in the processor is set to an accuracy that remains, and the intermediate value calculation unit approximates the non-stall error. The intermediate approximate square root reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy is calculated by performing error expansion one-stage approximation calculation with the error expanded so as to remain in the calculation result.

上記のように、本発明によれば、中間近似精度を持つ中間近似平方根逆数が誤差拡大１段近似計算により算出され、それから、目標精度を持つ目標平方根逆数が中間近似平方根逆数から多段近似計算により算出される。中間近似平方根逆数では近似精度が中間近似精度に制限されており、プロセッサの浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残っている。したがって、差が小さすぎる２つの値の減算処理を行うことなく、近似計算を行える。これにより、ストールを回避でき、計算速度を向上できる。 As described above, according to the present invention, an intermediate approximate square root inverse having intermediate approximate accuracy is calculated by error-enlarged one-stage approximate calculation, and then a target square root inverse having target accuracy is calculated from the intermediate approximate square root inverse by multi-stage approximate calculation. Calculated. In the intermediate approximate square root reciprocal, the approximation accuracy is limited to the intermediate approximation accuracy, and there remains a non-stall error corresponding to the upper limit bit shift amount that does not cause stall due to bit shift of the mantissa part due to the subtraction processing of the floating point number of the processor Yes. Therefore, approximate calculation can be performed without subtracting two values whose difference is too small. Thereby, a stall can be avoided and calculation speed can be improved.

また、前記中間値演算部は、誤差拡大を行わない１段近似計算を１回と前記誤差拡大１段近似計算を１回行うように構成され、前記入力値をａとし、近似計算前後の近似平方根逆数をｘ_ｉ、ｘ_ｉ＋１としたときに、誤差拡大を行わない１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（０．５−０．５×a×ｘ_ｉ）を求める処理を行い、前記誤差拡大１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（（０．５＋β）−０．５×a×ｘ_ｉ）を求める処理を行い、誤差成分βが前記非ストール誤差に応じて設定されてよい。 The intermediate value calculation unit is configured to perform one-stage approximation calculation without error expansion once and the error expansion one-stage approximation calculation once, with the input value as a, and approximation before and after the approximation calculation. A process of obtaining x _{i + 1} = x _i + x _i × (0.5−0.5 × a × x _i ) as a one-stage approximation calculation without performing error expansion when the reciprocal square root is x _i and x _{i + 1.} As the error expansion one-stage approximation calculation, a process of obtaining x _{i + 1} = x _i + x _i × ((0.5 + β) −0.5 × a × x _i ) is performed, and the error component β is converted into the non-stall error. It may be set accordingly.

また、前記プロセッサがＩＥＥＥ７５４により規定される浮動小数点数を処理し、前記初期近似精度が５ｂｉｔ精度であり、前記目標近似精度が５３ｂｉｔ精度であり、前記中間近似精度が１４ｂｉｔ精度であってよい。また、前記誤差成分βが、２^−１５であってよい。 The processor may process a floating-point number defined by IEEE754, the initial approximation accuracy may be 5 bit accuracy, the target approximation accuracy may be 53 bit accuracy, and the intermediate approximation accuracy may be 14 bit accuracy. Further, the error component β may be ^2-15 .

また、前記目標値演算部は、前記中間近似逆数をｘ_ｉとし、ε_ｉ＝１−ａ×ｘ_ｉ ^２としたときに、多段近似計算として、ｘ_ｉ×（１＋（1/2）×ε_ｉ＋（3/8）×ε_ｉ ^２＋（5/16）×ε_ｉ ^３）を求める処理を行ってよい。 Further, the target value calculation portion, the intermediate approximation reciprocal and _{x i,} when the _{_{ε i = 1-a × x}} i 2, as multi-stage _{approximation, x i × (1+ (1/2} ) × ε _i + (3/8) × ε _i ² + (5/16) × ε _i ³ ) may be performed.

本発明は上記の演算装置の態様に限定されない。本発明の別の態様は、例えば、下記に挙げるように、逆数の近似計算プログラムおよび近似計算方法であり、また、平方根逆数の近似計算プログラムおよび近似計算方法である。上記の演算装置の態様における各種の付加的な構成は、これらプログラムおよび方法の態様にも、各態様に適したかたちで適用されてよい。 The present invention is not limited to the above embodiment of the arithmetic device. Another aspect of the present invention is, for example, an approximate calculation program and an approximate calculation method for reciprocal numbers, and an approximate calculation program and an approximate calculation method for reciprocal square roots, as described below. The various additional configurations in the above-described arithmetic device aspect may be applied to these program and method aspects in a manner suitable for each aspect.

本発明において、逆数の近似計算プログラムの態様は、浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置に記憶され、前記プロセッサにより実行される逆数の近似計算プログラムであって、逆数を求めるべき入力値に対応し所定の初期近似精度を有する逆数である初期近似逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標逆数を算出する処理を前記プロセッサに実行させる逆数の近似計算プログラムにおいて、前記初期近似逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似逆数を算出する中間値演算処理と、前記中間近似逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標逆数を算出する目標値演算処理とを前記プロセッサに実行させ、前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似逆数を算出する処理である。 In the present invention, an aspect of the reciprocal approximate calculation program is an reciprocal approximate calculation program stored in an arithmetic unit having a processor having an arithmetic function for processing a floating-point number, and executed by the processor. The process of calculating a target reciprocal having a preset target accuracy by performing an approximate calculation by Newton-Raphson method on an initial approximate reciprocal that is a reciprocal having a predetermined initial approximate accuracy corresponding to an input value to be obtained. In the approximate calculation program of the reciprocal number to be executed by the processor, the one-step approximate calculation by the Newton-Raphson method is performed at least once for the initial approximate reciprocal, and a predetermined value set between the initial approximate accuracy and the target accuracy is set. An intermediate value calculation process for calculating an intermediate approximate inverse having intermediate approximate accuracy, and for the intermediate approximate inverse, Multi-level approximation calculation by Newton-Raphson method so that the accuracy increases from the approximate accuracy to the target accuracy, and causes the processor to execute target value calculation processing for calculating the target reciprocal. The non-stall error corresponding to the upper limit bit shift amount that does not cause a stall due to the bit shift of the mantissa part due to the floating point number subtraction process in FIG. This is a process of calculating the intermediate approximate reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy by performing error expansion one-stage approximation calculation with the error expanded so as to remain in the result.

また、本発明における逆数の近似計算方法の態様は、浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置にて実行される逆数の近似計算方法であって、逆数を求めるべき入力値に対応し所定の初期近似精度を有する逆数である初期近似逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標逆数を算出する処理を前記演算装置に実行させる逆数の近似計算方法において、前記初期近似逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似逆数を算出する中間値演算処理と、前記中間近似逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標逆数を算出する目標値演算処理とを前記プロセッサに実行させ、前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似逆数を算出する処理である。 An aspect of the approximate calculation method of the reciprocal number according to the present invention is an approximate calculation method of the reciprocal number executed by an arithmetic unit including a processor having a calculation function for processing a floating-point number, and an input value for which the reciprocal number is to be obtained. The arithmetic unit performs an approximate calculation by the Newton-Raphson method on the initial approximate inverse that is a reciprocal having a predetermined initial approximate accuracy and calculates a target inverse having a preset target accuracy. In the approximate calculation method of the reciprocal to be performed, a one-stage approximate calculation by the Newton-Raphson method is performed at least once on the initial approximate reciprocal, and a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy The intermediate value calculation processing for calculating the intermediate approximate inverse having the intermediate approximate inverse and the intermediate approximate inverse is improved from the intermediate approximate accuracy to the target accuracy. In this way, the processor performs a target value calculation process for calculating the target reciprocal by performing a multi-stage approximate calculation by the Newton-Raphson method, and the intermediate approximation accuracy is calculated by the mantissa part of the floating point number subtraction process in the processor. The accuracy is set such that a non-stall error corresponding to the upper limit bit shift amount that does not cause a stall due to a bit shift remains, and the intermediate value calculation process expands the error so that the non-stall error remains in the approximate calculation result. This is a process of calculating the reciprocal of the intermediate approximation with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy by performing a step approximation calculation.

また、本発明における平方根逆数の近似計算プログラムの態様は、浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置に記憶され、前記プロセッサにより実行される平方根逆数の近似計算プログラムであって、平方根逆数を求めるべき入力値に対応し所定の初期近似精度を有する平方根逆数である初期近似平方根逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標平方根逆数を算出する処理を前記プロセッサに実行させる平方根逆数の近似計算プログラムにおいて、前記初期近似平方根逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似平方根逆数を算出する中間値演算処理と、前記中間近似平方根逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標平方根逆数を算出する目標値演算処理とを前記プロセッサに実行させ、前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似平方根逆数を算出する処理である。 An aspect of the approximate calculation program for reciprocal square root in the present invention is an approximate calculation program for reciprocal square root stored in an arithmetic unit having a processor having a calculation function for processing a floating-point number and executed by the processor. The target square root having a preset target accuracy is obtained by performing an approximate calculation using the Newton-Raphson method for the initial approximate square root inverse that corresponds to the input value for which the inverse square root is to be obtained and has a predetermined initial approximation accuracy. In the approximate square root reciprocal calculation program for causing the processor to perform processing for calculating the reciprocal, the initial approximate square root reciprocal is subjected to one-stage approximate calculation by Newton-Raphson method at least once, and the initial approximate accuracy and the target Intermediate approximate square root reciprocal with a predetermined intermediate approximate accuracy set between precisions A target for calculating the target square root reciprocal by performing multi-stage approximate calculation by Newton-Raphson method so as to increase the accuracy from the intermediate approximate accuracy to the target accuracy for the intermediate value arithmetic processing to be calculated and the intermediate approximate square root reciprocal In the intermediate approximation accuracy, a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part accompanying a subtraction process of a floating-point number in the processor remains. The intermediate value calculation processing is performed so that the intermediate approximation accuracy does not exceed the intermediate approximation accuracy by performing error expansion one-stage approximation calculation in which the error is expanded so that the non-stall error remains in the approximate calculation result. Is a process for calculating the reciprocal of the intermediate approximate square root.

また、本発明における平方根逆数の近似計算方法の態様は、浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置にて実行される平方根逆数の近似計算方法であって、平方根逆数を求めるべき入力値に対応し所定の初期近似精度を有する平方根逆数である初期近似平方根逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標平方根逆数を算出する処理を前記演算装置に実行させる平方根逆数の近似計算方法において、前記初期近似平方根逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似平方根逆数を算出する中間値演算処理と、前記中間近似平方根逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標平方根逆数を算出する目標値演算処理とを前記プロセッサに実行させ、前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似平方根逆数を算出する処理である。 The aspect of the approximate calculation method of the inverse square root according to the present invention is an approximate calculation method of the inverse square root executed by an arithmetic unit having a processor having an arithmetic function for processing a floating-point number, and obtains the inverse square root. A process for calculating a target square root reciprocal having a preset target accuracy by performing an approximate calculation by the Newton-Raphson method for an initial approximate square root reciprocal which is a reciprocal square root having a predetermined initial approximation accuracy corresponding to a power input value In the approximate square root reciprocal calculation method, the first arithmetic square root inverse is performed at least once by the Newton-Raphson method, and the initial approximate accuracy and the target accuracy are between An intermediate value calculation process for calculating an intermediate approximate square root reciprocal having a predetermined intermediate approximate accuracy, and the intermediate approximate square root inverse On the other hand, the processor performs a target value calculation process for calculating the reciprocal target square root by performing a multi-stage approximate calculation by Newton-Raphson method so that the accuracy increases from the intermediate approximate accuracy to the target accuracy, The approximation accuracy is set to an accuracy in which a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part due to a floating point number subtraction process in the processor remains. A process of calculating the intermediate approximate square root reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy by performing error expansion one-stage approximation calculation in which the error is expanded so that a non-stall error remains in the approximate calculation result. is there.

上記のように、本発明は、逆数および平方根逆数の近似計算におけるストールを回避でき、計算を高速に行うことができる演算装置、近似計算プログラムおよび方法を提供できる。 As described above, the present invention can provide an arithmetic device, an approximate calculation program, and a method that can avoid a stall in the approximate calculation of the reciprocal and reciprocal square root, and can perform the calculation at high speed.

以下、本発明の実施の形態に係る演算装置について、図面を用いて説明する。 Hereinafter, an arithmetic device according to an embodiment of the present invention will be described with reference to the drawings.

「第１の実施の形態」
図１は、第１の実施の形態に係る演算装置を示している。第１の実施の形態では、逆数が計算される。 “First Embodiment”
FIG. 1 shows an arithmetic unit according to the first embodiment. In the first embodiment, the reciprocal is calculated.

図１において、演算装置１は、コンピュータであり、プロセッサ３と記憶部５とを備えている。プロセッサ３は、浮動小数点数を処理する四則演算機能を有している。例えばプロセッサ３がＩＢＭのＰＯＷＥＲ５であり、浮動小数点数はＩＥＥＥ７５４の倍精度（図１１参照）で表される。記憶部５はメモリであり、本発明に係る逆数の近似計算プログラムを記憶している。記憶部５に記憶された近似計算プログラムがプロセッサ３で実行される。これにより、プロセッサ３に入力値ａが入力され、入力値ａの逆数１／ａが計算されて、逆数１／ａがプロセッサ３から出力される。 In FIG. 1, the arithmetic device 1 is a computer and includes a processor 3 and a storage unit 5. The processor 3 has four arithmetic operation functions for processing floating point numbers. For example, the processor 3 is IBM POWER5, and the floating-point number is expressed in double precision (see FIG. 11) of IEEE754. The storage unit 5 is a memory, and stores the reciprocal approximate calculation program according to the present invention. The approximate calculation program stored in the storage unit 5 is executed by the processor 3. As a result, the input value a is input to the processor 3, the reciprocal 1 / a of the input value a is calculated, and the reciprocal 1 / a is output from the processor 3.

図２は、演算装置１の構成を示す機能ブロック図である。図２の各構成は、図１のプロセッサ３および記憶部５により、すなわち、記憶部５の近似計算プログラムをプロセッサ３にて実行することにより実現される。 FIG. 2 is a functional block diagram illustrating the configuration of the arithmetic device 1. Each configuration of FIG. 2 is realized by the processor 3 and the storage unit 5 of FIG. 1, that is, the processor 3 executes the approximate calculation program of the storage unit 5.

図２に示すように、演算装置１は、入力部１１、初期値取得部１３、中間値演算部１５、目標値演算部１７および出力部１９で構成されている。入力部１１は、逆数を求めるべき入力値ａを入力する。 As shown in FIG. 2, the arithmetic device 1 includes an input unit 11, an initial value acquisition unit 13, an intermediate value calculation unit 15, a target value calculation unit 17, and an output unit 19. The input unit 11 inputs an input value a for which an inverse number is to be obtained.

初期値取得部１３は、入力値ａに対応する初期近似逆数を取得する。初期近似逆数は、所定の初期近似精度を有する逆数であり、初期近似精度は比較的低い値でよい。初期値取得部１３としては典型的には以下の構成が考えられる。初期値取得部１３としては、逆数テーブル（数値とその逆数を関連づけるテーブル）が予め記憶部５に記憶される。そして、入力値ａに対応する逆数がテーブルから読み出される。精度が低くてよいので、逆数テーブルは小さくてよい。入力値から、逆数テーブルの精度に対応する上位桁部分が取り出され、上位桁部分に対応する逆数がテーブルから読み出されてよい。また、この初期値取得部１３の機能の一部として、逆数テーブルから読み出された低精度の逆数に近似計算が施されて、精度がある程度まで高められてもよい（ここでの近似計算がニュートンラフソン法であってもよい）。 The initial value acquisition unit 13 acquires an initial approximate inverse corresponding to the input value a. The initial approximate inverse is an inverse having a predetermined initial approximate accuracy, and the initial approximate accuracy may be a relatively low value. As the initial value acquisition unit 13, the following configuration is typically considered. As the initial value acquisition unit 13, a reciprocal table (a table associating a numerical value with its reciprocal) is stored in the storage unit 5 in advance. Then, the reciprocal corresponding to the input value a is read from the table. Since the accuracy may be low, the reciprocal table may be small. From the input value, the upper digit part corresponding to the accuracy of the reciprocal table may be extracted, and the reciprocal corresponding to the upper digit part may be read from the table. In addition, as a part of the function of the initial value acquisition unit 13, an approximate calculation may be performed on the low-precision reciprocal read from the reciprocal table to improve the accuracy to some extent (the approximate calculation here may be performed). Newton-Raphson method may be used).

初期値取得部１３は、プロセッサ３に一般に備えられた精度が比較的低い逆数取得機能により好適に実現される。一般のプロセッサには、このような基本的な逆数取得機能についてのコマンドが設定されている。このコマンドが逆数計算のプログラムに組み込まれ、同コマンドに応答して逆数が呼び出されてよい。つまり、プログラム中の逆数呼出のコマンドとそれに対応するプロセッサの逆数呼出機能により初期値取得部１３が好適に実現される。例えば、本実施の形態の例として取り上げているＩＢＭのＰＯＷＥＲ５では、コマンドｆｒｅ（ａ）によって、８ｂｉｔ精度の逆数が呼び出される。浮動小数点の仮数が５３桁であることを考慮すると、初期の逆数の精度は低いといえる。 The initial value acquisition unit 13 is preferably realized by an inverse number acquisition function that is generally provided in the processor 3 and has a relatively low accuracy. A command for such a basic inverse number acquisition function is set in a general processor. This command may be incorporated into the reciprocal calculation program, and the reciprocal may be called in response to the command. That is, the initial value acquisition unit 13 is suitably realized by the reciprocal call command in the program and the reciprocal call function of the processor corresponding thereto. For example, in IBM POWER5 taken as an example of the present embodiment, an 8-bit precision reciprocal is called by the command fre (a). Considering that the mantissa of the floating point is 53 digits, it can be said that the accuracy of the initial reciprocal is low.

上記のようにして本実施の形態の演算装置１では入力値ａに対応する初期近似逆数が取得される。初期近似逆数が有する初期近似精度は上述したように比較的低い。この初期近似逆数を対象として本実施の形態の演算装置１では下記の構成、すなわち、中間値演算部１５および目標値演算部１７によりニュートンラフソン法の近似計算が行われ、近似精度が所定の目標精度まで高められる。 As described above, the arithmetic device 1 according to the present embodiment acquires the initial approximate inverse corresponding to the input value a. As described above, the initial approximation accuracy of the initial approximate reciprocal is relatively low. In the arithmetic device 1 of the present embodiment for the initial approximate reciprocal, the following configuration, that is, the intermediate value arithmetic unit 15 and the target value arithmetic unit 17 perform the approximate calculation of the Newton-Raphson method, and the approximation accuracy is a predetermined target. Increased accuracy.

中間値演算部１５は、初期値取得部１３で取得された初期近似逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、所定の中間近似精度を有する中間近似逆数を算出する。中間近似精度は、初期近似精度と目標精度の間の精度に設定されており、そして、以下に説明するように、プロセッサでのストールを回避できる適切な精度に設定されている。 The intermediate value calculation unit 15 performs a one-stage approximation calculation by the Newton-Raphson method at least once on the initial approximate inverse obtained by the initial value acquisition unit 13 to calculate an intermediate approximate inverse having a predetermined intermediate approximation accuracy. To do. The intermediate approximation accuracy is set to an accuracy between the initial approximation accuracy and the target accuracy, and is set to an appropriate accuracy that can avoid a stall in the processor, as will be described below.

プロセッサでは、図１１および図１２を用いて説明したように、同程度の２つの数値の減算処理が行われると、仮数部のビットシフトが発生する。減算対象の２つの数値が近いほど、ビットシフト量が大きくなる。したがって、本実施の形態のような演算装置１の場合、中間近似精度が高すぎると、次の目標値演算部１７での近似計算における減算処理にてビットシフト量が大きくなりすぎて、以降の浮動小数点演算命令の実行をストールさせてしまう。ストールが生じると、プロセッサでは、数サイクルの間、命令が発行されなくなる。 In the processor, as described with reference to FIGS. 11 and 12, when the same subtraction process of two numerical values is performed, a bit shift of the mantissa part occurs. The closer the two numerical values to be subtracted, the larger the bit shift amount. Therefore, in the case of the arithmetic device 1 as in the present embodiment, if the intermediate approximation accuracy is too high, the bit shift amount becomes too large in the subtraction process in the approximate calculation in the next target value calculation unit 17, and thereafter Stalls execution of floating point arithmetic instructions. When a stall occurs, the processor will not issue instructions for several cycles.

そこで、中間近似精度は、上限ビットシフト量に対応する誤差（以下、非ストール誤差という）が残る精度に設定されている。上限ビットシフト量は、上記のビットシフトによるストールが生じない範囲に予め設定された所定のビットシフト量である。本実施の形態では、後述にて具体例を用いて示すように、ストール発生の閾になるビットシフト量が特定される。閾値よりビットシフト量が大きいとストールが発生する。そこで、ストールを確実に防ぐために、閾値より更に１ビット小さいシフト量が上限ビットシフト量に定められる。非ストール誤差は、入力値ａの逆数１／ａ（真の逆数）に対する相対的な誤差であり、相対誤差ということができる。中間値演算部１５は、近似計算結果に非ストール誤差を残すために、通常の１段近似計算ではなく、本実施の形態の特徴である誤差拡大１段近似計算を行う。誤差拡大１段近似計算とは、非ストール誤差が近似計算結果に残るように故意に誤差を拡大したニュートンラフソン法の１ステップの近似計算である。これにより、中間値演算部１５は、中間近似精度を越えないように近似精度を制限した中間近似逆数を算出できる。 Therefore, the intermediate approximation accuracy is set to an accuracy in which an error corresponding to the upper limit bit shift amount (hereinafter referred to as non-stall error) remains. The upper limit bit shift amount is a predetermined bit shift amount set in advance in a range where no stall occurs due to the above bit shift. In the present embodiment, as will be described later using a specific example, a bit shift amount that is a threshold for occurrence of stall is specified. Stall occurs when the bit shift amount is larger than the threshold value. Therefore, in order to surely prevent the stall, a shift amount smaller by 1 bit than the threshold is determined as the upper limit bit shift amount. The non-stall error is a relative error with respect to the reciprocal 1 / a (true reciprocal) of the input value a, and can be referred to as a relative error. In order to leave a non-stall error in the approximate calculation result, the intermediate value calculation unit 15 performs the error expansion one-stage approximate calculation, which is a feature of the present embodiment, instead of the normal one-stage approximate calculation. The error-enlarged one-stage approximate calculation is a one-step approximate calculation of the Newton-Raphson method in which the error is intentionally increased so that a non-stall error remains in the approximate calculation result. As a result, the intermediate value calculation unit 15 can calculate an intermediate approximate reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy.

目標値演算部１７は、中間値演算部で求められた中間近似逆数に対して、中間近似精度から目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、目標逆数を算出する。目標逆数は、目標精度を有する逆数である。この目標逆数が、計算結果の逆数１／ａとして出力部１９から出力される。 The target value calculation unit 17 calculates a target reciprocal by performing multistage approximation calculation by Newton-Raphson method so that the accuracy from the intermediate approximation accuracy to the target accuracy is increased with respect to the intermediate approximate reciprocal obtained by the intermediate value calculation unit. . The target reciprocal is a reciprocal having a target accuracy. The target reciprocal is output from the output unit 19 as the reciprocal 1 / a of the calculation result.

次に、図３を参照し、上記の中間値演算部１５および目標値演算部１７による処理について、具体例を用いてさらに説明する。この具体例では、プロセッサ３が前出のＰＯＷＥＲ５であり、浮動小数点はＩＥＥＥ７５４に従うとする。この場合、プロセッサ３の標準装備の逆数呼出機能により、８ｂｉｔ精度の逆数が得られる。すなわち、初期近似精度は８ｂｉｔである。また、浮動小数点の精度は５３ｂｉｔであり、そこで、目標精度は、浮動小数点で表現できる最大限の精度である５３ｂｉｔとする。 Next, referring to FIG. 3, the processing by the intermediate value calculation unit 15 and the target value calculation unit 17 will be further described using a specific example. In this specific example, it is assumed that the processor 3 is the above POWER5 and the floating point conforms to IEEE754. In this case, the reciprocal of 8 bits precision can be obtained by the reciprocal call function provided as a standard feature of the processor 3. That is, the initial approximation accuracy is 8 bits. The precision of the floating point is 53 bits. Therefore, the target precision is 53 bits, which is the maximum precision that can be expressed in the floating point.

また、上述した本実施の形態で例示するプロセッサ３では、ストールが生じるビットシフト量の閾値が、１５ビットである。それより大きいビットシフト量が減算処理で生じると、ストールが発生する。より詳細には、数サイクルの期間は新しい命令が発行されなくなるといった種類のストールが発生し、これにより性能が低下する。そこで、本実施の形態では、ストールを確実に防ぐために、上限ビットシフト量が、閾値（１５ビット）より１ビット少ない１４ビットに設定される。そして、上限ビットシフト量に対応する非ストール誤差が残るように、上限ビットシフト量の１４ビットに対応して、中間近似精度が１４ビットに設定される。 In the processor 3 exemplified in the present embodiment described above, the threshold value of the bit shift amount causing the stall is 15 bits. If a bit shift amount larger than that occurs in the subtraction process, a stall occurs. More specifically, a type of stall occurs in which a new instruction is not issued for a period of several cycles, which reduces performance. Therefore, in this embodiment, the upper limit bit shift amount is set to 14 bits, which is 1 bit less than the threshold value (15 bits), in order to prevent stalling with certainty. Then, the intermediate approximation accuracy is set to 14 bits corresponding to the upper limit bit shift amount of 14 bits so that a non-stall error corresponding to the upper limit bit shift amount remains.

中間値演算部１５は、上記の中間近似精度を達成するために、下記の式（６）の誤差拡大１段近似計算を行う。
ｘ_ｉ＋１＝ｘ_ｉ×（（２＋α）−ａ×ｘ_ｉ）・・・式（６）
上記の式（６）を通常のニュートンラフソン法の１段近似計算（式（２））と比較すると、誤差成分αが挿入されている。α＝２^−１４＝０．００００６１０３５１５６２５とする。この誤差成分αを挿入することにより、誤差が故意に拡大されて、１４ｂｉｔ精度の近似結果が得られる。 In order to achieve the above intermediate approximation accuracy, the intermediate value calculation unit 15 performs an error expansion one-stage approximation calculation of the following equation (6).
x _{i + 1} = x _i × ((2 + α) −a × x _i ) (6)
When the above equation (6) is compared with the normal Newton-Raphson method of one-stage approximation (equation (2)), the error component α is inserted. It is assumed that α = 2 ⁻¹⁴ = 0.0000610315625. By inserting this error component α, the error is intentionally expanded, and an approximation result with 14-bit accuracy is obtained.

次に、目標値演算部１７では、下記の式（７）の多段近似計算が行われる。
ｘ_ｉ＋２＝ｘ_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ））））
ε_ｉ＝１−ａ×ｘ_ｉ・・・式（７）
この多段近似計算式により、１４ｂｉｔ精度の中間近似逆数から、５３ｂｉｔ精度の目標逆数が得られる。 Next, the target value calculation unit 17 performs multistage approximation calculation of the following equation (7).
x _{i + 2} = x _i × (1 + ε _i × (1 + ε _i × (1 + ε _i × (1 + ε _i ))))
ε _i = 1−a × x _i (7)
With this multistage approximate calculation formula, a 53-bit precision target inverse is obtained from an intermediate approximate inverse of 14 bits.

上記の中間値演算部１５および目標値演算部１７の処理についてさらに説明する。通常のニュートンラフソン法の１段近似計算（式（２））が行われると、近似精度は倍になる。この例では、初期近似精度が８ｂｉｔなので、通常の１段近似計算では精度が１６ｂｉｔになる。これでは、精度が高くなりすぎる。すなわち、次の多段近似計算（式（７））にて、ε_ｉ（＝１−ａ×ｘ_ｉ）を計算するときに、大きなビットシフトが生じ、ストールが発生する。 The processing of the intermediate value calculation unit 15 and the target value calculation unit 17 will be further described. When the one-step approximate calculation (formula (2)) of the normal Newton-Raphson method is performed, the approximation accuracy is doubled. In this example, since the initial approximation accuracy is 8 bits, the accuracy is 16 bits in the normal one-stage approximation calculation. This is too accurate. That is, when ε _i (= 1−a × x _i ) is calculated in the next multistage approximate calculation (equation (7)), a large bit shift occurs and a stall occurs.

一方、本発明では、式（６）の誤差拡大１段近似計算にて、誤差成分α＝２^−１４が挿入されている。したがって、中間近似逆数の精度が、１６ｂｉｔまで高くならず、１４ｂｉｔで留まる。これにより、次のステップにてε_ｉ（＝１−ａ×ｘ_ｉ）を計算するときに、多大なビットシフトが発生せず、ストールが回避される。 On the other hand, in the present invention, the error component α = 2 ⁻¹⁴ is inserted in the error expansion one-stage approximation calculation of Expression (6). Therefore, the accuracy of the intermediate approximate reciprocal does not increase to 16 bits, but remains at 14 bits. As a result, when ε _i (= 1−a × x _i ) is calculated in the next step, a large bit shift does not occur and a stall is avoided.

また、式（７）の多段近似計算を、前出の式（３）と比較すると、多少異なっていることが分かる。式（３）は、ニュートンラフソン法の２ステップ分の近似計算の基本的な式である。式（３）と式（７）を比較すると、式（７）では、（１＋ε_ｉ）の項が１つ多く挿入されている。したがって、式（７）は、２ステップよりも大きい近似計算、すなわち、２ステップ分より精度が上がる近似計算を行っていることを意味する。式（７）をこのように設定している理由は下記の通りである。 Further, when the multistage approximate calculation of Expression (7) is compared with the above Expression (3), it can be seen that the calculation is slightly different. Expression (3) is a basic expression for approximate calculation for two steps of the Newton-Raphson method. Comparing Equation (3) and Equation (7), one more term of (1 + ε _i ) is inserted in Equation (7). Therefore, equation (7) means that approximate calculation larger than two steps, that is, approximate calculation with higher accuracy than two steps is performed. The reason why the equation (7) is set in this way is as follows.

既に説明したように、中間近似精度は１４ｂｉｔ精度に達している。ニュートンラフソン法の１ステップでは精度が倍であり、２ステップでは精度が４倍である。したがって、２ステップの近似計算を行えば、十分に目標精度（＝５３ｂｉｔ）が得られると考えられる。それにも拘わらず、本実施の形態では、より高い精度が得られるように、上記の如く多段近似計算が調整されており、（１＋ε_ｉ）の項が増やされている。これは、確実に目標精度が得られるようにするためである。プロセッサ３にて上記のような計算を行うと、プロセッサ内部の機構により精度がばらつくことがあり、通常の２ステップ分の近似計算を採用したのでは、検算を実施すると５３ｂｉｔが得られないことがある。この点を考慮し、本実施の形態では、多段近似計算が調整されて、精度が高められている。なお、上記のように本実施の形態では多段近似計算の精度を２ステップ分より上げているが、前出の式（７）では式の左辺をｘ_ｉ＋２と記載し、表現を前出の式（３）と揃えている。 As already explained, the intermediate approximation accuracy reaches 14-bit accuracy. The accuracy is doubled in one step of the Newton-Raphson method, and the accuracy is quadrupled in two steps. Therefore, it is considered that the target accuracy (= 53 bits) can be sufficiently obtained by performing the approximate calculation of two steps. Nevertheless, in the present embodiment, the multistage approximation calculation is adjusted as described above so that higher accuracy is obtained, and the term (1 + ε _i ) is increased. This is to ensure that the target accuracy can be obtained. If the processor 3 performs the calculation as described above, the accuracy may vary depending on the internal mechanism of the processor, and 53 bits may not be obtained if verification is performed using the usual approximate calculation for two steps. is there. In consideration of this point, in the present embodiment, the multistage approximate calculation is adjusted to improve the accuracy. As described above, in the present embodiment, the accuracy of the multistage approximation calculation is increased by two steps, but in the above equation (7), the left side of the equation is described as x _{i + 2} and the expression is represented by the above equation. (3).

以上に、演算装置１の構成について説明した。次に、図４を参照し、演算装置１の動作を説明する。図４のフローチャートは、本実施の形態による逆数の近似計算方法を示すものであり、また、同近似計算方法をプロセッサ３に実行させる逆数の近似計算プログラムを示している。 The configuration of the arithmetic device 1 has been described above. Next, the operation of the arithmetic device 1 will be described with reference to FIG. The flowchart of FIG. 4 shows the approximate calculation method of the reciprocal number according to this embodiment, and also shows the approximate calculation program of the reciprocal number that causes the processor 3 to execute the approximate calculation method.

図４に示すように、演算装置１では、入力部１１により、逆数を求めるべき入力値ａが入力され（Ｓ１）、初期値取得部１３により入力値ａに対応する初期近似逆数が取得される（Ｓ３）。初期近似逆数は、比較的低い初期近似精度（８ｂｉｔ精度）を有している。 As shown in FIG. 4, in the arithmetic device 1, the input unit 11 receives an input value a from which an inverse number is to be obtained (S 1), and the initial value acquisition unit 13 acquires an initial approximate inverse number corresponding to the input value a. (S3). The initial approximate reciprocal has a relatively low initial approximate accuracy (8-bit accuracy).

そして、中間値演算部１５にて、初期近似逆数に対して、誤差拡大１段近似計算（式（６））が施されて、中間近似逆数が算出される（Ｓ５）。中間近似逆数は、中間近似精度（１４ｂｉｔ精度）を有している。さらに、目標値演算部１７にて、中間近似逆数に対して、多段近似計算（式（７））が施されて、目標逆数が計算される（Ｓ７）。目標逆数は目標精度（５３ｂｉｔ精度）を有している。目標逆数が、最終的な逆数１／ａとして出力部１９から出力される（Ｓ９）。 Then, the intermediate value calculation unit 15 performs error expansion one-stage approximation calculation (formula (6)) on the initial approximate inverse, and calculates the intermediate approximate inverse (S5). The intermediate approximate inverse has intermediate approximate accuracy (14-bit accuracy). Further, the target value calculation unit 17 performs multistage approximation calculation (formula (7)) on the intermediate approximate inverse, and calculates the target inverse (S7). The target reciprocal has a target accuracy (53-bit accuracy). The target reciprocal is output from the output unit 19 as the final reciprocal 1 / a (S9).

図５は、図４のフローチャートに対応する具体的なプログラムの好適な例を示している。図５のステップＳ１１（ｘ＝ｆｒｅ（ａ））は、図４のステップ３に対応し、初期近似逆数を取得するステップである。「ｆｒｅ（ａ）」は、ａの逆数を呼び出すコマンドであり、このコマンドに応答して８ｂｉｔ精度の逆数が返ってくる。なお、同コマンドは、ＩＢＭのプロセッサ「ＰＯＷＥＲ５」に設定されている。同「ＰＯＷＥＲ４」での同様のコマンドは、「ｆｒｅｓ（ａ）」である。このようにコマンドは適用対象のプロセッサに従うことはもちろんである。 FIG. 5 shows a preferred example of a specific program corresponding to the flowchart of FIG. Step S11 (x = fre (a)) in FIG. 5 corresponds to step 3 in FIG. 4 and is a step for obtaining an initial approximate inverse. “Fre (a)” is a command for calling the reciprocal of a, and an 8-bit precision reciprocal is returned in response to this command. This command is set in the IBM processor “POWER5”. A similar command in “POWER4” is “fres (a)”. Of course, the command follows the target processor.

次に、図５のステップＳ１３は、図４のステップ５に対応し、式（６）の誤差拡大１段近似計算に対応している。図示のように、誤差成分２^−１４（＝０．００００６１０３５１５６２５）が２に加算されており、１４ｂｉｔ精度の解（逆数）が得られる。 Next, step S13 in FIG. 5 corresponds to step 5 in FIG. 4 and corresponds to the error expansion one-stage approximation calculation of equation (6). As shown in the figure, the error component 2 ⁻¹⁴ (= 0.0000610315625) is added to 2, and a 14-bit precision solution (reciprocal) is obtained.

また、図５のステップＳ１５は、図４のステップＳ７に対応し、式（７）の多段近似計算に対応している。ステップＳ１５の１行目で、ｙ＝１．０−ｘ×ａが計算される。ｘは中間近似逆数であり、ｙは式（７）のε_ｉに相当する。この計算は図示のようにちょうど誤差計算に相当している。ｘの精度が良すぎると、ｘ×ａが１に近くなりすぎて、大きなビットシフトとそれによるストールが発生する。これに対して、本実施の形態では、ｘの精度が１４ｂｉｔに抑えられているので、ストールが回避される。したがって、図５の８個の計算命令は、パイプラインストールを起こすことなく実行される。 Step S15 in FIG. 5 corresponds to step S7 in FIG. 4 and corresponds to the multistage approximation calculation of Expression (7). In the first line of step S15, y = 1.0−xxa is calculated. x is an intermediate approximate inverse, and y corresponds to ε _i in equation (7). This calculation corresponds to an error calculation as shown in the figure. If the accuracy of x is too good, xxa becomes too close to 1, and a large bit shift and resulting stall occurs. On the other hand, in the present embodiment, since the accuracy of x is suppressed to 14 bits, stall is avoided. Therefore, the eight calculation instructions in FIG. 5 are executed without causing pipeline installation.

図６および図７は、比較例として別のプログラムを示している。これらプログラムは、通常のニュートンラフソン法による単純な実装を行った場合に相当する。図６の例では、通常の１段近似計算が３回行われる。この場合、７つの命令で計算がすむ。しかし、ｘ＝ｘ×ｙという積の計算精度にプロセッサ上での制約があり、最終の精度が本実施の形態の手法に劣る。また、図７の例では、精度は高いが、同程度の数値の減算が行われて、パイプラインストールが生じるため、性能が悪く、本実施の形態の手法と比べると計算速度が約半分になる。これらと比べると、図５に示す本発明の手法は、高精度かつ高速に逆数を算出することができる。 6 and 7 show another program as a comparative example. These programs correspond to a simple implementation using the normal Newton-Raphson method. In the example of FIG. 6, normal one-stage approximation calculation is performed three times. In this case, calculation is completed with seven instructions. However, the calculation accuracy of the product x = x × y is limited on the processor, and the final accuracy is inferior to the method of the present embodiment. In the example of FIG. 7, although the accuracy is high, subtraction of numerical values of the same degree is performed and pipeline installation occurs, so the performance is poor and the calculation speed is about half that of the method of the present embodiment. . Compared to these, the method of the present invention shown in FIG. 5 can calculate the reciprocal with high accuracy and high speed.

以上に本発明の第１の実施の形態について説明した。上記のように、本発明によれば、中間近似精度を持つ中間近似逆数が誤差拡大１段近似計算により算出され、それから、目標精度を持つ目標逆数が中間近似逆数から多段近似計算により算出される。中間近似逆数では近似精度が中間近似精度に制限されており、プロセッサの浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残っている。したがって、差が小さすぎる２つの値の減算処理を行うことなく、近似計算を行える。これにより、ストールを回避でき、計算速度を向上できる。 The first embodiment of the present invention has been described above. As described above, according to the present invention, an intermediate approximate inverse having intermediate approximate accuracy is calculated by error-enlarged one-stage approximate calculation, and then a target inverse having target accuracy is calculated from the intermediate approximate inverse by multistage approximate calculation. . In the intermediate approximate reciprocal, the approximation accuracy is limited to the intermediate approximation accuracy, and there remains a non-stall error corresponding to the upper limit bit shift amount that does not cause stall due to bit shift of the mantissa part due to the subtraction processing of the floating point number of the processor . Therefore, approximate calculation can be performed without subtracting two values whose difference is too small. Thereby, a stall can be avoided and calculation speed can be improved.

また、本発明によれば、入力値をａとし、近似計算前後の近似逆数をｘ_ｉ、ｘ_ｉ＋１としたときに、誤差拡大１段近似計算として、式（６）に示したように、ｘ_ｉ＋１＝ｘ_ｉ×（（２＋α）−a×ｘ_ｉ）が求められ、誤差成分αが非ストール誤差に応じて設定されている。これにより、非ストール誤差が近似計算結果に残る誤差拡大１段近似計算を好適に実現できる。 Further, according to the present invention, when the input value is a and the approximate reciprocal numbers before and after the approximate calculation are x _i and x _{i + 1} , as shown in the equation (6), _{i + 1} = x _i × ((2 + α) −a × x _i ) is obtained, and the error component α is set according to the non-stall error. As a result, it is possible to suitably realize an error-enlarged one-stage approximate calculation in which a non-stall error remains in the approximate calculation result.

また、本発明によれば、プロセッサにより、ＩＥＥＥ７５４により規定される浮動小数点数が処理される。初期近似精度が８ｂｉｔ精度であり、目標近似精度が５３ｂｉｔ精度であり、中間近似精度が１４ｂｉｔ精度である。これにより、ＩＥＥＥ７５４により規定される浮動小数点数を処理するプロセッサにおいて、ストールを回避できる範囲で中間近似精度を高くすることができ、その後の多段近似計算をできるだけ短くでき、計算速度を好適に向上できる。 Further, according to the present invention, a floating point number defined by IEEE754 is processed by the processor. The initial approximation accuracy is 8-bit accuracy, the target approximation accuracy is 53-bit accuracy, and the intermediate approximation accuracy is 14-bit accuracy. As a result, in a processor that processes floating-point numbers defined by IEEE 754, intermediate approximation accuracy can be increased within a range where stall can be avoided, and subsequent multistage approximation calculation can be shortened as much as possible, and the calculation speed can be suitably improved. .

ここで、本発明を適用せず、１段近似計算は行わないとする。そして、最初から、初期近似逆数を出発点として、目標精度まで達するように、多段近似計算を行ったとする。このような処理も理論的には可能かもしれない。また、比較的低い初期近似精度の逆数を用いた減算処理が行われればよいので、本発明が問題とするような減算での大きなビットシフトも生じない。しかしながら、この場合、多段近似計算で出発点の精度が低いことは相当に不利である。十分な精度を得るためには、相当な数の（１＋ε_ｉ）の項を、多段近似計算の式に挿入しなければならず、そのために計算時間が長くなってしまう。このような構成と比較しても、本発明は逆数を高速に計算でき、有利である。 Here, it is assumed that the present invention is not applied and one-stage approximate calculation is not performed. Then, it is assumed that the multistage approximation calculation is performed from the beginning so as to reach the target accuracy using the initial approximate inverse as a starting point. Such processing may also be theoretically possible. In addition, since a subtraction process using a reciprocal with a relatively low initial approximation accuracy may be performed, a large bit shift in subtraction, which is a problem of the present invention, does not occur. However, in this case, it is considerably disadvantageous that the accuracy of the starting point is low in the multistage approximate calculation. In order to obtain sufficient accuracy, a considerable number of (1 + ε _i ) terms must be inserted into the multistage approximate calculation formula, which increases the calculation time. Compared to such a configuration, the present invention is advantageous in that the reciprocal can be calculated at high speed.

また、本発明によれば、誤差成分αが、２^−１４である。これにより、非ストール誤差を残した中間近似逆数を好適に得られる。 Further, according to the present invention, the error component α is ^2-14 . As a result, an intermediate approximate reciprocal with a non-stall error remaining can be suitably obtained.

また、本発明によれば、中間近似逆数をｘ_ｉとし、式（７）に示したように、ε_ｉ＝１−ａ×ｘ_ｉとしたときに、多段近似計算として、ｘ_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ））））が求められる。これにより、目標精度に到達できる多段近似計算を好適に行える。 Further, according to the present invention, an intermediate approximate reciprocal and _{x i,} as shown in Equation (7), when the _{_{ε i = 1-a × x}} i, as multi-stage approximation, _{x i} × (1 + ε _i * (1+ [epsilon] _i * (1+ [epsilon] _i * (1+ [epsilon] _i )))). Thereby, the multistage approximate calculation which can reach | attain target accuracy can be performed suitably.

「第２の実施の形態」
次に、本発明に係る第２の実施の形態について説明する。第２の実施の形態に係る演算装置は、第１の実施の形態と同様にプロセッサおよび記憶部を備えている。上述した第１の実施の形態では、逆数が計算されたのに対して、第２の実施の形態では、平方根の逆数（以下、平方根逆数という）が算出される。すなわち、平方根逆数の近似計算プログラムが記憶部に記憶され、同プログラムがプロセッサにて実行されて、入力値ａから平方根逆数１／ａ^１／２が計算される。以下の説明では、第１の実施の形態と共通する事項の説明は適当に省略する。 “Second Embodiment”
Next, a second embodiment according to the present invention will be described. The arithmetic device according to the second embodiment includes a processor and a storage unit as in the first embodiment. In the first embodiment described above, the reciprocal is calculated, whereas in the second embodiment, the reciprocal of the square root (hereinafter referred to as the reciprocal square root) is calculated. That is, an approximate square root reciprocal calculation program is stored in the storage unit, and the program is executed by the processor to calculate the reciprocal square root 1 / a ^1/2 from the input value a. In the following description, description of matters common to the first embodiment will be appropriately omitted.

図８は、第２の実施の形態における演算装置２１を示している。第１の実施の形態と同様、本実施の形態でも、演算装置２１は、入力部３１、初期値取得部３３、中間値演算部３５、目標値演算部３７および出力部３９で構成されている。ただし、初期値取得部３３、中間値演算部３５および目標値演算部３７の機能は、対応する第１の実施の形態の構成とは異なっている。 FIG. 8 shows the arithmetic unit 21 in the second embodiment. Similar to the first embodiment, in this embodiment, the arithmetic device 21 is composed of an input unit 31, an initial value acquisition unit 33, an intermediate value calculation unit 35, a target value calculation unit 37, and an output unit 39. . However, the functions of the initial value acquisition unit 33, the intermediate value calculation unit 35, and the target value calculation unit 37 are different from the corresponding configurations of the first embodiment.

初期値取得部３３は、入力値ａに対応する初期近似平方根逆数を取得する。初期値取得部３３は、第１の実施の形態と同様に、平方根逆数のテーブルを備えてよい。さらに、初期値取得部３３は、第１の実施の形態と同様に、プロセッサ３に予め備えられた平方根逆数の取得機能によって好適に実現される。本実施の形態で取り上げるＩＢＭのＰＯＷＥＲ５の場合、平方根逆数を読み出すコマンドは、ｆｒｓｑｒｔｅ（ａ）であり、このコマンドを平方根逆数の計算プログラムに組み込むことにより初期値取得部３３が実現される。そして、同プロセッサでは、５ｂｉｔ精度の平方根逆数が得られ、すなわち、初期近似精度が５ｂｉｔ精度になる。 The initial value acquisition unit 33 acquires an initial approximate square root inverse corresponding to the input value a. The initial value acquisition unit 33 may include a table of reciprocal square roots, as in the first embodiment. Furthermore, the initial value acquisition unit 33 is preferably realized by an inverse square root acquisition function provided in advance in the processor 3 as in the first embodiment. In the case of IBM POWER5 taken up in the present embodiment, the command for reading the reciprocal square root is frsqrte (a), and the initial value acquisition unit 33 is realized by incorporating this command into the reciprocal square root calculation program. In the processor, a reciprocal square root with a 5-bit precision is obtained, that is, the initial approximation precision becomes a 5-bit precision.

次に、中間値演算部３５について説明する。中間値演算部３５は、初期近似平方根逆数から、中間近似精度を持つ中間近似平方根逆数を計算する。中間値演算部３５では、第１の実施の形態と同様に、ニュートンラフソン法の１段近似計算が行われる。ただし、第１の実施の形態では、１回の１段近似計算が行われ、この１回の１段近似計算が、誤差が挿入された誤差拡大１段近似計算であった。これに対して、第２の実施の形態では、２回の１段近似計算が行われる。そして、１回目は、通常の１段近似計算が行われ、２回目は誤差拡大近似計算が行われる。 Next, the intermediate value calculator 35 will be described. The intermediate value calculator 35 calculates an intermediate approximate square root inverse having intermediate approximation accuracy from the initial approximate square root inverse. In the intermediate value calculation unit 35, one-stage approximate calculation of the Newton-Raphson method is performed as in the first embodiment. However, in the first embodiment, one one-stage approximation calculation is performed once, and this one-stage approximation calculation is an error expansion one-stage approximation calculation in which an error is inserted. On the other hand, in the second embodiment, two one-stage approximation calculations are performed. Then, the normal one-stage approximation calculation is performed at the first time, and the error expansion approximation calculation is performed at the second time.

中間値演算部３５の処理について、第１の実施の形態と同様に具体例を用いて説明する。１回目の通常の１段近似計算は、下の式（８）で表される。
ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（０．５−０．５×a×ｘ_ｉ）・・・式（８） The process of the intermediate value calculation unit 35 will be described using a specific example as in the first embodiment. The first normal one-stage approximation calculation is expressed by the following equation (8).
x _{i + 1} = x _i + x _i × (0.5−0.5 × a × x _i ) (8)

式（８）は、前出の式（４）と実質的に同じであり、つまり、前出のニュートンラフソン法において平方根逆数を求めるための通常の１段近似計算に相当する。これに対して、２回目には、下の式（９）で表される誤差拡大１段近似計算が行われる。
ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（（０．５＋β）−０．５×a×ｘ_ｉ）・・・式（９）
この２回目の誤差拡大１段近似計算で、中間近似精度が得られるように、誤差成分βが設定されている。具体的には、初期近似精度は上述の通り５ｂｉｔである。中間近似精度は、第１の実施の形態と同様に、ストールを回避するために１４ｂｉｔ精度に設定される。この場合に、β＝２^−１５＝０．００００３０５１７６が好適である。この設定により、まず、１回目の通常の１段近似計算（式（８））にて近似精度が５ｂｉｔから１０ｂｉｔに上がり、２回目の誤差拡大１段近似計算にて近似精度が１０ｂｉｔから１４ｂｉｔに上がる。 Expression (8) is substantially the same as Expression (4) above, that is, it corresponds to normal one-stage approximate calculation for obtaining the reciprocal square root in the above Newton-Raphson method. On the other hand, in the second time, the error expansion one-stage approximation calculation represented by the following equation (9) is performed.
x _{i + 1} = x _i + x _i × ((0.5 + β) −0.5 × a × x _i ) (9)
The error component β is set so that intermediate approximation accuracy can be obtained in the second error expansion one-stage approximation calculation. Specifically, the initial approximation accuracy is 5 bits as described above. As in the first embodiment, the intermediate approximation accuracy is set to 14-bit accuracy in order to avoid stall. In this case, β = 2 ⁻¹⁵ = 0.0000305176 is preferable. With this setting, the approximation accuracy first increases from 5 bits to 10 bits in the first normal one-stage approximation calculation (formula (8)), and the approximation accuracy increases from 10 bits to 14 bits in the second error expansion one-stage approximation calculation. Go up.

次に、目標値演算部３７について説明する。目標値演算部３７は、中間近似平方根逆数に対して、中間近似精度から目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、目標平方根逆数を算出する。この場合、多段近似計算として、下の式（１０）の計算が好適に行われる。式（１０）で算出された目標平方根逆数が、計算結果の平方根逆数１／ａ^１／２として出力部３９から出力される。
ｘ_ｉ＋２＝ｘ_ｉ（１＋（1/2）ε_ｉ＋（3/8）ε_ｉ ^２＋（5/16）ε_ｉ ^３）
ε_ｉ＝１−ａ×ｘ_ｉ ^２・・・式（１０） Next, the target value calculation unit 37 will be described. The target value calculation unit 37 calculates a target square root reciprocal by performing a multistage approximate calculation by the Newton-Raphson method so that the accuracy from the intermediate approximate precision to the target accuracy is increased with respect to the intermediate approximate square root reciprocal. In this case, the following equation (10) is suitably performed as the multistage approximate calculation. The target square root reciprocal calculated by Expression (10) is output from the output unit 39 as the square root reciprocal 1 / a ^1/2 of the calculation result.
x _{i + 2} = x _i (1+ (1/2) ε _i + (3/8) ε _i ² + (5/16) ε _i ³ )
ε _i = 1−a × x _i ² Formula (10)

式（１０）を、前出の式（５）と比較する。式（５）は、ニュートンラフソン法の原理に沿った２ステップ分の近似計算であった。式（５）と比べると、式（１０）では、３次（係数５／１６）の項までの計算が行われる。４次の項は省略されている。４次の項が計算されなくても、目標精度（５３ｂｉｔ精度）が達成されるからである。 Equation (10) is compared with equation (5) above. Equation (5) was an approximate calculation for two steps in accordance with the principle of the Newton-Raphson method. Compared with equation (5), equation (10) performs calculations up to the third order (coefficient 5/16) term. The fourth order term is omitted. This is because even if the fourth-order term is not calculated, the target accuracy (53-bit accuracy) is achieved.

式（１０）において、ｘ_ｉが中間近似平方根逆数である。ｘ_ｉの精度が高すぎた場合、ε_ｉ（＝１−ａ×ｘ_ｉ ^２）を計算するときに、大きなビットシフトが生じ、ストールが発生する。これに対して、本実施の形態では、式（９）の誤差拡大１段近似計算にてｘ_ｉの精度が抑えられており、これによりストールが回避される。 In Expression (10), x _i is an intermediate approximate square root reciprocal. If the accuracy of x _i is too high, a large bit shift occurs when stalling ε _i (= 1−a × x _i ² ). On the other hand, in the present embodiment, the accuracy of _xi is suppressed by the error expansion one-stage approximation calculation of Expression (9), thereby avoiding the stall.

以上に、演算装置２１の構成について説明した。次に、図９を参照し、演算装置２１の動作を説明する。図９のフローチャートは、本実施の形態による平方根逆数の近似計算方法を示すものであり、また、同近似計算方法をプロセッサ３に実行させる平方根逆数の近似計算プログラムを示している。 The configuration of the arithmetic device 21 has been described above. Next, the operation of the arithmetic unit 21 will be described with reference to FIG. The flowchart of FIG. 9 shows an approximate calculation method for the inverse square root according to the present embodiment, and also shows an approximate calculation program for the inverse square root that causes the processor 3 to execute the approximate calculation method.

図９に示すように、演算装置１では、入力部３１により、平方根逆数を求めるべき入力値ａが入力され（Ｓ２１）、初期値取得部３３により入力値ａに対応する初期近似平方根逆数が取得される（Ｓ２３）。初期近似平方根逆数は、比較的低い初期近似精度（５ｂｉｔ精度）を有している。 As shown in FIG. 9, in the arithmetic unit 1, an input value a from which an inverse square root number is to be obtained is input by the input unit 31 (S 21), and an initial approximate square root inverse number corresponding to the input value a is acquired by the initial value acquisition unit 33. (S23). The initial approximate square root reciprocal has a relatively low initial approximate accuracy (5-bit accuracy).

そして、中間値演算部３５にて、初期近似平方根逆数に対して、１回の１段近似計算（式（８））と、１回の誤差拡大１段近似計算（式（９））が順次施されて、中間近似平方根逆数が算出される（Ｓ２５）。中間近似平方根逆数は、中間近似精度（１４ｂｉｔ精度）を有している。さらに、目標値演算部３７にて、中間近似平方根逆数に対して、多段近似計算（式（１０））が施されて、目標精度（５３ｂｉｔ精度）を有する目標平方根逆数が計算される（Ｓ２７）。目標平方根逆数が、最終的な平方根逆数１／ａ^１／２として出力部３９から出力される（Ｓ２９）。 Then, the intermediate value calculation unit 35 sequentially performs one one-stage approximation calculation (formula (8)) and one error expansion one-stage approximation calculation (formula (9)) for the initial approximate square root reciprocal. The intermediate approximate square root reciprocal is calculated (S25). The intermediate approximate square root reciprocal has intermediate approximate accuracy (14-bit accuracy). Further, the target value calculation unit 37 performs multistage approximation calculation (equation (10)) on the intermediate approximate square root reciprocal to calculate the target square root reciprocal having the target accuracy (53 bit accuracy) (S27). . The target square root reciprocal is output from the output unit 39 as the final square root reciprocal 1 / a ^1/2 (S29).

図１０は、図９のフローチャートに対応する具体的なプログラムの好適な例を示している。図１０のステップＳ３１（ｘ＝ｆｒｓｑｒｔｅ（ａ））は、図９のステップ２３に対応し、初期近似平方根逆数を取得するステップである。「ｆｒｓｑｒｔｅ（ａ）」は、ａの平方根逆数を呼び出すコマンドであり、このコマンドに応答して５ｂｉｔ精度の平方根逆数が返ってくる。なお、同コマンドは、ＩＢＭのプロセッサ「ＰＯＷＥＲ５」に設定されている。第１の実施の形態と同様、適用対象のプロセッサに従うコマンドが用いられることはもちろんである。 FIG. 10 shows a preferred example of a specific program corresponding to the flowchart of FIG. Step S31 (x = frsqrte (a)) in FIG. 10 corresponds to step 23 in FIG. 9 and is a step of acquiring an initial approximate square root reciprocal. “Frsqrte (a)” is a command for calling the reciprocal square root of a, and in response to this command, the reciprocal square root of 5 bits precision is returned. This command is set in the IBM processor “POWER5”. Of course, as in the first embodiment, a command according to the processor to be applied is used.

次に、図１０のステップＳ３３、Ｓ３５は、図９のステップＳ２５に対応し、式（８）の通常の１段近似計算と式（９）の誤差拡大１段近似計算にそれぞれ対応している。ステップＳ３５では、図示のように、誤差成分２^−１５（＝０．００００３０５１７６）が０．５に加算されている。 Next, steps S33 and S35 in FIG. 10 correspond to step S25 in FIG. 9 and correspond to the normal one-stage approximation calculation of Expression (8) and the error expansion one-stage approximation calculation of Expression (9), respectively. . In step S35, as shown, the error component ² -15 (= 0.0000305176) is added to 0.5.

また、図１０のステップＳ３７は、図９のステップＳ２７に対応し、式（１０）の多段近似計算に対応している。ステップＳ３７の１行目で、ｙ＝ｘ×ｘが計算され、２行目で、ｙ＝０．５−ａ×ｙが計算される。また、前出のステップＳ３３の最初に、ａ×＝０．５と記述され、ａ＝０．５×ａになっている。したがって、２行目の計算は、ｙ＝０．５−０．５×ａ×ｘ^２＝０．５（１−ａ×ｘ^２）であり、ここでεｉ（＝１−ａ×ｘ_ｉ ^２）が計算される。そして、この計算の際に、ｘの精度が予め抑えられているので、プロセッサ３でのパイプラインストールの発生を回避することができる。 Step S37 in FIG. 10 corresponds to step S27 in FIG. 9 and corresponds to the multistage approximate calculation of Expression (10). In the first line of step S37, y = xx * x is calculated, and in the second line, y = 0.5-a * y is calculated. In addition, a * = 0.5 is described at the beginning of the above-described step S33, and a = 0.5 * a. Therefore, the calculation of the second row is y = 0.5−0.5 × a × x ² = 0.5 (1−a × x ² ), where εi (= 1−a × x _i ^2). ) Is calculated. In this calculation, since the accuracy of x is suppressed in advance, occurrence of pipeline installation in the processor 3 can be avoided.

また、図１０のプログラム例では、ステップＳ３７の最後に、ｘ＝（ｆｌｏａｔ）ｘという行が付加されており、ｘ（中間近似平方根逆数）の下位に０クリア処理を行って、値が丸められる。このステップは、次のステップ（多段近似計算）においてε_ｉを計算する際の精度を上げるために設けられている。この（ｆｌｏａｔ）の処理により、２４ｂｉｔ以下の値が０になる。その結果、次のｘ^２の計算精度が上がり、その次のε_ｉの計算精度が上がる。ε_ｉの計算精度を上げることで、最終の計算精度を上げることができる。このような理由で、ステップ３７に（ｆｌｏａｔ）の処理が追加されている。 In the program example of FIG. 10, a line x = (float) x is added at the end of step S37, and the value is rounded by performing 0 clear processing below x (intermediate approximate square root reciprocal). . This step is provided to increase the accuracy in calculating ε _i in the next step (multistage approximation calculation). By this (float) process, the value of 24 bits or less becomes 0. As a result, the calculation accuracy of the next x ² is increased, and the calculation accuracy of the next ε _i is increased. The final calculation accuracy can be increased by increasing the calculation accuracy of ε _i . For this reason, a (float) process is added to step 37.

以上に本発明の第２の実施の形態について説明した。上記のように、本発明によれば、中間近似精度を持つ中間近似平方根逆数が誤差拡大１段近似計算により算出され、それから、目標精度を持つ目標平方根逆数が中間近似平方根逆数から多段近似計算により算出される。中間近似平方根逆数では近似精度が中間近似精度に制限されており、プロセッサの浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残っている。したがって、差が小さすぎる２つの値の減算処理を行うことなく、近似計算を行える。これにより、ストールを回避でき、計算速度を向上できる。 The second embodiment of the present invention has been described above. As described above, according to the present invention, an intermediate approximate square root inverse having intermediate approximate accuracy is calculated by error-enlarged one-stage approximate calculation, and then a target square root inverse having target accuracy is calculated from the intermediate approximate square root inverse by multi-stage approximate calculation. Calculated. In the intermediate approximate square root reciprocal, the approximation accuracy is limited to the intermediate approximation accuracy, and there remains a non-stall error corresponding to the upper limit bit shift amount that does not cause stall due to bit shift of the mantissa part due to the subtraction processing of the floating point number of the processor Yes. Therefore, approximate calculation can be performed without subtracting two values whose difference is too small. Thereby, a stall can be avoided and calculation speed can be improved.

また、本発明によれば、中間値演算処理では、誤差拡大を行わない１段近似計算が１回行われ、そして、誤差拡大１段近似計算が１回行われる。入力値をａとし、近似計算前後の近似平方根逆数をｘ_ｉ、ｘ_ｉ＋１としたときに、誤差拡大を行わない１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（０．５−０．５×a×ｘ_ｉ）が求められ、誤差拡大１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（（０．５＋β）−０．５×a×ｘ_ｉ）が求められ、誤差成分βが非ストール誤差に応じて設定されている。これにより、非ストール誤差が近似計算結果に残る誤差拡大１段近似計算を好適に実現できる。 Further, according to the present invention, in the intermediate value calculation process, the one-stage approximation calculation without error expansion is performed once, and the error expansion one-stage approximation calculation is performed once. Assuming that the input value is a and the approximate square root reciprocals before and after the approximate calculation are x _i and x _{i + 1} , x _{i + 1} = x _i + x _i × (0.5-0. 5 × a × x _i ) is obtained, and x _{i + 1} = x _i + x _i × ((0.5 + β) −0.5 × a × x _i ) is obtained as the error expansion one-stage approximation calculation, and the error component β Is set according to the non-stall error. As a result, it is possible to suitably realize an error-enlarged one-stage approximate calculation in which a non-stall error remains in the approximate calculation result.

また、本発明によれば、プロセッサにより、ＩＥＥＥ７５４により規定される浮動小数点数が処理される。初期近似精度が５ｂｉｔ精度であり、目標近似精度が５３ｂｉｔ精度であり、中間近似精度が１４ｂｉｔ精度である。これにより、ＩＥＥＥ７５４により規定される浮動小数点数を処理するプロセッサにおいて、ストールを回避できる範囲で中間近似精度を高くすることができ、その後の多段近似計算をできるだけ短くでき、計算速度を好適に向上できる。 Further, according to the present invention, a floating point number defined by IEEE754 is processed by the processor. The initial approximation accuracy is 5 bit accuracy, the target approximation accuracy is 53 bit accuracy, and the intermediate approximation accuracy is 14 bit accuracy. As a result, in a processor that processes floating-point numbers defined by IEEE 754, intermediate approximation accuracy can be increased within a range where stall can be avoided, and subsequent multistage approximation calculation can be shortened as much as possible, and the calculation speed can be suitably improved. .

また、本発明によれば、誤差成分βが、２^−１５である。これにより、非ストール誤差を残した中間近似平方根逆数を好適に得られる。 Further, according to the present invention, the error component β is ^2-15 . As a result, an intermediate approximate square root reciprocal that leaves a non-stall error can be suitably obtained.

また、本発明によれば、中間近似平方根逆数をｘ_ｉとし、式（１０）に示したように、ε_ｉ＝１−ａ×ｘ_ｉ ^２としたときに、多段近似計算として、ｘ_ｉ×（１＋（1/2）×ε_ｉ＋（3/8）×ε_ｉ ^２＋（5/16）×ε_ｉ ^３）が求められる。これにより、目標精度に到達できる多段近似計算を好適に行える。 Further, according to the present invention, when the intermediate approximate square root reciprocal is x _i and ε _i = 1−a × x _i ² as shown in Equation (10), x _i × (1+ (1/2) × ε _i + (3/8) × ε _i ² + (5/16) × ε _i ³ ) is obtained. Thereby, the multistage approximate calculation which can reach | attain target accuracy can be performed suitably.

以上に本発明の好適な実施の形態を説明した。しかし、本発明は上述の実施の形態に限定されず、当業者が本発明の範囲内で上述の実施の形態を変形可能なことはもちろんである。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and it goes without saying that those skilled in the art can modify the above-described embodiments within the scope of the present invention.

以上のように、本発明にかかる演算装置は、プロセッサにおける逆数および平方根逆数の計算を高速化できるという効果を有し、バイオインフォマティクスにおけるクーロン力やクーロンポテンシャルの計算をはじめとして、各種の分野における各種の計算において有用である。 As described above, the arithmetic device according to the present invention has an effect of speeding up the calculation of the reciprocal and square root reciprocal in the processor, and includes various calculations in various fields including the calculation of Coulomb force and Coulomb potential in bioinformatics. It is useful in the calculation of

本発明の第１の実施の形態に係る演算装置を示す図である。It is a figure which shows the arithmetic unit which concerns on the 1st Embodiment of this invention. 第１の実施の形態に係る演算装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the arithmetic unit which concerns on 1st Embodiment. 演算装置の中間値演算部および目標値演算部による処理について具体例を用いて説明する図である。It is a figure explaining the process by the intermediate value calculating part and target value calculating part of a calculating device using a specific example. 第１の実施の形態に係る演算装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the arithmetic unit which concerns on 1st Embodiment. 図４のフローチャートに対応する具体的なプログラムの好適な例を示す図である。It is a figure which shows the suitable example of the specific program corresponding to the flowchart of FIG. 図５に対する比較例としての別のプログラムを示す図である。It is a figure which shows another program as a comparative example with respect to FIG. 図５に対する比較例としての別のプログラムを示す図である。It is a figure which shows another program as a comparative example with respect to FIG. 本発明の第２の実施の形態に係る演算装置を示す機能ブロック図である。It is a functional block diagram which shows the arithmetic unit which concerns on the 2nd Embodiment of this invention. 第２の実施の形態に係る演算装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the arithmetic unit which concerns on 2nd Embodiment. 図９のフローチャートに対応する具体的なプログラムの好適な例を示す図である。It is a figure which shows the suitable example of the specific program corresponding to the flowchart of FIG. ＩＥＥＥ７５４に規定される浮動小数点数を示す図である。It is a figure which shows the floating point number prescribed | regulated to IEEE754. 同程度の２つの浮動小数点数を減算したときにプロセッサで発生するビットシフトを示す図である。It is a figure which shows the bit shift which generate | occur | produces in a processor when subtracting two floating point numbers of the same grade.

符号の説明Explanation of symbols

１演算装置
３プロセッサ
５記憶部
１１入力部
１３初期値取得部
１５中間値演算部
１７目標値演算部
１９出力部 DESCRIPTION OF SYMBOLS 1 Computation device 3 Processor 5 Storage part 11 Input part 13 Initial value acquisition part 15 Intermediate value calculation part 17 Target value calculation part 19 Output part

Claims

逆数を求めるべき入力値に対応し所定の初期近似精度を有する逆数である初期近似逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標逆数を算出する演算装置であって、
浮動小数点数を処理する演算機能を有するプロセッサと、逆数の近似計算プログラムを記憶する記憶部とを有し、前記記憶部に記憶された前記近似計算プログラムを前記プロセッサにて実行することによって中間値演算部および目標値演算部が前記演算装置に備えられ、
前記中間値演算部は、前記初期近似逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似逆数を算出する中間値演算処理を行い、
前記目標値演算部は、前記中間近似逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標逆数を算出する目標値演算処理を行い、
前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算部は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似逆数を算出することを特徴とする演算装置。 An operation for calculating a target reciprocal having a preset target accuracy by performing an approximate calculation by the Newton-Raphson method for an initial approximate reciprocal that is a reciprocal having a predetermined initial approximate accuracy corresponding to an input value for which the reciprocal is to be obtained. A device,
A processor having a calculation function for processing a floating-point number; and a storage unit for storing an approximate calculation program of an inverse number; and executing the approximate calculation program stored in the storage unit by the processor A computing unit and a target value computing unit are provided in the computing device,
The intermediate value calculation unit performs a one-stage approximation calculation by the Newton-Raphson method on the initial approximate reciprocal at least once, and obtains a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy. Perform intermediate value calculation processing to calculate the intermediate approximate reciprocal having
The target value calculation unit performs a multi-stage approximate calculation by Newton-Raphson method so as to increase the accuracy from the intermediate approximate accuracy to the target accuracy with respect to the intermediate approximate inverse, and calculates the target inverse And
The intermediate approximation accuracy is set to an accuracy in which a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part due to a floating point number subtraction process in the processor remains, Calculating the intermediate approximate reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy by performing error expansion one-stage approximation calculation in which the non-stall error remains in the approximate calculation result. An arithmetic unit characterized by the above.

前記中間値演算部は、前記入力値をａとし、近似計算前後の近似逆数をｘ_ｉ、ｘ_ｉ＋１としたときに、前記誤差拡大１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ×（（２＋α）−a×ｘ_ｉ）を求める処理を行い、誤差成分αが前記非ストール誤差に応じて設定されていることを特徴とする請求項１に記載の演算装置。 The intermediate value calculation unit assumes that the input value is a and approximate reciprocals before and after approximate calculation are x _i and x _{i + 1,} and x _{i + 1} = x _i × ((2 + α) The arithmetic unit according to claim 1, wherein a process of obtaining −a × x _i ) is performed, and an error component α is set according to the non-stall error.

前記プロセッサがＩＥＥＥ７５４により規定される浮動小数点数を処理し、前記初期近似精度が８ｂｉｔ精度であり、前記目標近似精度が５３ｂｉｔ精度であり、前記中間近似精度が１４ｂｉｔ精度であることを特徴とする請求項２に記載の演算装置。 The processor processes a floating point number defined by IEEE754, wherein the initial approximation accuracy is 8 bit accuracy, the target approximation accuracy is 53 bit accuracy, and the intermediate approximation accuracy is 14 bit accuracy. Item 3. The arithmetic device according to Item 2.

前記誤差成分αが、２^−１４であることを特徴とする請求項３に記載の演算装置。 The arithmetic unit according to claim 3, wherein the error component α is ^2-14 .

前記目標値演算部は、前記中間近似逆数をｘ_ｉとし、ε_ｉ＝１−ａ×ｘ_ｉとしたときに、前記多段近似計算として、ｘ_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ×（１＋ε_ｉ））））を求める処理を行うことを特徴とする請求項１〜４のいずれかに記載の演算装置。 The target value computing unit, the intermediate approximation reciprocal and _{x i,} when the _{_{ε i = 1-a × x}} i, as the multi-stage _{approximation, x i × (1 + ε} i × (1 + ε i × (1 + ε i The arithmetic unit according to claim 1, wherein a process for obtaining × (1 + ε _i )))) is performed.

浮動小数点数を処理する演算機能を有するプロセッサと、平方根逆数の近似計算プログラムを記憶する記憶部とを有し、前記記憶部に記憶された前記近似計算プログラムを前記プロセッサにて実行することにより、平方根逆数を求めるべき入力値に対応し所定の初期近似精度を有する平方根逆数である初期近似平方根逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標平方根逆数を算出する演算装置であって、
前記初期近似平方根逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似平方根逆数を算出する中間値演算部と、
前記中間近似平方根逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標平方根逆数を算出する目標値演算部と、
を備え、
前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算部は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似平方根逆数を算出することを特徴とする演算装置。 A processor having an arithmetic function for processing a floating-point number, and a storage unit for storing an approximate calculation program of a reciprocal square root, and executing the approximate calculation program stored in the storage unit by the processor, The target square root reciprocal having a preset target accuracy is obtained by performing an approximate calculation using the Newton-Raphson method for the initial approximate square root reciprocal, which is a reciprocal square root corresponding to the input value for which the reciprocal square root is to be obtained, An arithmetic device for calculating
The initial approximate square root reciprocal is subjected to a one-stage approximate calculation by Newton-Raphson method at least once, and an intermediate approximate square root reciprocal having a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy is obtained. An intermediate value calculation unit to be calculated;
A target value calculation unit that calculates the target square root reciprocal by performing multistage approximate calculation by Newton-Raphson method so that the accuracy increases from the intermediate approximate accuracy to the target accuracy with respect to the intermediate approximate square root reciprocal,
With
The intermediate approximation accuracy is set to an accuracy in which a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part due to a floating point number subtraction process in the processor remains, The intermediate approximate square root reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy is calculated by performing error expansion one-stage approximation calculation with the error expanded so that the non-stall error remains in the approximate calculation result. An arithmetic device characterized by that.

前記中間値演算部は、誤差拡大を行わない１段近似計算を１回と前記誤差拡大１段近似計算を１回行うように構成され、前記入力値をａとし、近似計算前後の近似平方根逆数をｘ_ｉ、ｘ_ｉ＋１としたときに、誤差拡大を行わない１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（０．５−０．５×a×ｘ_ｉ）を求める処理を行い、前記誤差拡大１段近似計算として、ｘ_ｉ＋１＝ｘ_ｉ＋ｘ_ｉ×（（０．５＋β）−０．５×a×ｘ_ｉ）を求める処理を行い、誤差成分βが前記非ストール誤差に応じて設定されていることを特徴とする請求項６に記載の演算装置。 The intermediate value calculation unit is configured to perform one-stage approximation calculation without error expansion once and the error expansion one-stage approximation calculation once, where the input value is a, and an approximate square root inverse before and after the approximation calculation. X _i , x _{i + 1} , a process for obtaining x _{i + 1} = x _i + x _i × (0.5−0.5 × a × x _i ) is performed as a one-stage approximation calculation without error expansion, As the error expansion one-stage approximation calculation, a process of obtaining x _{i + 1} = x _i + x _i × ((0.5 + β) −0.5 × a × x _i ) is performed, and the error component β is determined according to the non-stall error. The arithmetic device according to claim 6, wherein the arithmetic device is set.

前記プロセッサがＩＥＥＥ７５４により規定される浮動小数点数を処理し、前記初期近似精度が５ｂｉｔ精度であり、前記目標近似精度が５３ｂｉｔ精度であり、前記中間近似精度が１４ｂｉｔ精度であることを特徴とする請求項７に記載の演算装置。 The processor processes a floating point number defined by IEEE754, wherein the initial approximation accuracy is 5 bit accuracy, the target approximation accuracy is 53 bit accuracy, and the intermediate approximation accuracy is 14 bit accuracy. Item 8. The computing device according to Item 7.

前記誤差成分βが、２^−１５であることを特徴とする請求項８に記載の演算装置。 The arithmetic device according to claim 8, wherein the error component β is 2 ⁻¹⁵ .

前記目標値演算部は、前記中間近似逆数をｘ_ｉとし、ε_ｉ＝１−ａ×ｘ_ｉ ^２としたときに、多段近似計算として、ｘ_ｉ×（１＋（1/2）×ε_ｉ＋（3/8）×ε_ｉ ^２＋（5/16）×ε_ｉ ^３）を求める処理を行うことを特徴とする請求項６〜９のいずれかに記載の演算装置。 The target value computing unit, the intermediate approximation reciprocal and _{x i,} when the _{_{ε i = 1-a × x}} i 2, as multi-stage _{approximation, x i × (1+ (1/2} ) × ε i + The arithmetic unit according to any one of claims 6 to 9, wherein a process of obtaining (3/8) x ε _i ² + (5/16) x ε _i ³ ) is performed.

浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置に記憶され、前記プロセッサにより実行される逆数の近似計算プログラムであって、逆数を求めるべき入力値に対応し所定の初期近似精度を有する逆数である初期近似逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標逆数を算出する処理を前記プロセッサに実行させる逆数の近似計算プログラムにおいて、
前記初期近似逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似逆数を算出する中間値演算処理と、
前記中間近似逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標逆数を算出する目標値演算処理とを前記プロセッサに実行させ、
前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似逆数を算出する処理であることを特徴とする逆数の近似計算プログラム。 A reciprocal approximate calculation program stored in an arithmetic unit having a processor having an arithmetic function for processing a floating-point number and executed by the processor, and having a predetermined initial approximation accuracy corresponding to an input value for which the reciprocal is to be obtained. In the approximate calculation program of the reciprocal number, the processor performs the process of calculating the target reciprocal having the preset target accuracy by performing the approximate calculation by the Newton-Raphson method for the initial approximate reciprocal that is the reciprocal having
A one-stage approximation calculation by the Newton-Raphson method is performed at least once for the initial approximate inverse, and an intermediate approximate inverse having a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy is calculated. Intermediate value calculation processing,
For the intermediate approximate reciprocal, the processor performs a target value calculation process for calculating the target reciprocal by performing multi-stage approximate calculation by Newton-Raphson method so that the accuracy increases from the intermediate approximate precision to the target precision. ,
The intermediate approximation accuracy is set to an accuracy in which a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part due to a floating point number subtraction process in the processor remains, and the intermediate value calculation process Processing for calculating the intermediate approximate reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy by performing error expansion one-stage approximation calculation in which the error is expanded so that the non-stall error remains in the approximate calculation result An approximate reciprocal calculation program characterized by

浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置にて実行される逆数の近似計算方法であって、逆数を求めるべき入力値に対応し所定の初期近似精度を有する逆数である初期近似逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標逆数を算出する処理を前記演算装置に実行させる逆数の近似計算方法において、
前記初期近似逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似逆数を算出する中間値演算処理と、
前記中間近似逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標逆数を算出する目標値演算処理とを前記プロセッサに実行させ、
前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似逆数を算出する処理であることを特徴とする逆数の近似計算方法。 A reciprocal approximate calculation method executed by an arithmetic unit having a processor having an arithmetic function for processing a floating-point number, wherein the reciprocal is an initial reciprocal having a predetermined initial approximation accuracy corresponding to an input value to be obtained. In the approximate calculation method of the reciprocal for causing the arithmetic device to execute a process of calculating a target reciprocal having a preset target accuracy by performing an approximate calculation by the Newton-Raphson method for the approximate reciprocal,
A one-stage approximation calculation by the Newton-Raphson method is performed at least once for the initial approximate inverse, and an intermediate approximate inverse having a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy is calculated. Intermediate value calculation processing,
For the intermediate approximate reciprocal, the processor performs a target value calculation process for calculating the target reciprocal by performing multi-stage approximate calculation by Newton-Raphson method so that the accuracy increases from the intermediate approximate precision to the target precision. ,
The intermediate approximation accuracy is set to an accuracy in which a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part due to a floating point number subtraction process in the processor remains, and the intermediate value calculation process Processing for calculating the intermediate approximate reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy by performing error expansion one-stage approximation calculation in which the error is expanded so that the non-stall error remains in the approximate calculation result A reciprocal approximate calculation method characterized by

浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置に記憶され、前記プロセッサにより実行される平方根逆数の近似計算プログラムであって、平方根逆数を求めるべき入力値に対応し所定の初期近似精度を有する平方根逆数である初期近似平方根逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標平方根逆数を算出する処理を前記プロセッサに実行させる平方根逆数の近似計算プログラムにおいて、
前記初期近似平方根逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似平方根逆数を算出する中間値演算処理と、
前記中間近似平方根逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標平方根逆数を算出する目標値演算処理とを前記プロセッサに実行させ、
前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似平方根逆数を算出する処理であることを特徴とする平方根逆数の近似計算プログラム。 An approximate calculation program for a reciprocal square root stored in an arithmetic unit having a processor having an arithmetic function for processing a floating-point number, and executed by the processor, corresponding to an input value for which a reciprocal square root is to be obtained, a predetermined initial approximation Approximate square root reciprocal that causes the processor to perform a process of calculating a target square root reciprocal having a preset target accuracy by performing an approximate calculation by the Newton-Raphson method on an initial approximate square root reciprocal that is a reciprocal square root having accuracy. In the calculation program,
The initial approximate square root reciprocal is subjected to a one-stage approximate calculation by Newton-Raphson method at least once, and an intermediate approximate square root reciprocal having a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy is obtained. Intermediate value calculation processing to be calculated;
A target value calculation process for calculating the target square root reciprocal by performing multi-stage approximate calculation by Newton-Raphson method so that the accuracy from the intermediate approximate precision to the target precision is increased with respect to the intermediate approximate square root reciprocal. Let it run
The intermediate approximation accuracy is set to an accuracy in which a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part due to a floating point number subtraction process in the processor remains, and the intermediate value calculation process The intermediate approximate square root reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy is calculated by performing error expansion one-stage approximation calculation with the error expanded so that the non-stall error remains in the approximate calculation result. Approximate calculation program of reciprocal square root characterized by processing.

浮動小数点数を処理する演算機能を有するプロセッサを備えた演算装置にて実行される平方根逆数の近似計算方法であって、平方根逆数を求めるべき入力値に対応し所定の初期近似精度を有する平方根逆数である初期近似平方根逆数に対して、ニュートンラフソン法による近似計算を行って、予め設定された目標精度を有する目標平方根逆数を算出する処理を前記演算装置に実行させる平方根逆数の近似計算方法において、
前記初期近似平方根逆数に対して、ニュートンラフソン法による１段近似計算を少なくとも１回行って、前記初期近似精度と前記目標精度の間に設定された所定の中間近似精度を有する中間近似平方根逆数を算出する中間値演算処理と、
前記中間近似平方根逆数に対して、前記中間近似精度から前記目標精度まで精度が上がるようにニュートンラフソン法による多段近似計算を行って、前記目標平方根逆数を算出する目標値演算処理とを前記プロセッサに実行させ、
前記中間近似精度は、前記プロセッサにおける浮動小数点数の減算処理に伴う仮数部のビットシフトによるストールが生じない上限ビットシフト量に対応する非ストール誤差が残る精度に設定され、前記中間値演算処理は、前記非ストール誤差が近似計算結果に残るように誤差を拡大した誤差拡大１段近似計算を行うことにより、前記中間近似精度を越えないように近似精度を制限した前記中間近似平方根逆数を算出する処理であることを特徴とする平方根逆数の近似計算方法。 An approximate calculation method of a reciprocal square root executed by an arithmetic unit having a processor having a calculation function for processing a floating-point number, the reciprocal square root having a predetermined initial approximation accuracy corresponding to an input value for which the reciprocal square root is to be obtained In the approximate square root reciprocal calculation method for causing the arithmetic unit to execute a process of calculating a target square root reciprocal having a preset target accuracy by performing an approximate calculation by the Newton-Raphson method for the initial approximate square root reciprocal being
The initial approximate square root reciprocal is subjected to a one-stage approximate calculation by Newton-Raphson method at least once, and an intermediate approximate square root reciprocal having a predetermined intermediate approximate accuracy set between the initial approximate accuracy and the target accuracy is obtained. Intermediate value calculation processing to be calculated;
A target value calculation process for calculating the target square root reciprocal by performing multi-stage approximate calculation by Newton-Raphson method so that the accuracy from the intermediate approximate precision to the target precision is increased with respect to the intermediate approximate square root reciprocal. Let it run
The intermediate approximation accuracy is set to an accuracy in which a non-stall error corresponding to an upper limit bit shift amount that does not cause a stall due to a bit shift of a mantissa part due to a floating point number subtraction process in the processor remains, and the intermediate value calculation process The intermediate approximate square root reciprocal with the approximation accuracy limited so as not to exceed the intermediate approximation accuracy is calculated by performing error expansion one-stage approximation calculation with the error expanded so that the non-stall error remains in the approximate calculation result. Approximate calculation method of reciprocal square root characterized by being processing.