WO1999014685A1 - Data processor and data processing system - Google Patents

Data processor and data processing system Download PDF

Info

Publication number
WO1999014685A1
WO1999014685A1 PCT/JP1997/003259 JP9703259W WO9914685A1 WO 1999014685 A1 WO1999014685 A1 WO 1999014685A1 JP 9703259 W JP9703259 W JP 9703259W WO 9914685 A1 WO9914685 A1 WO 9914685A1
Authority
WO
WIPO (PCT)
Prior art keywords
data processing
control unit
unit
instruction
data
Prior art date
Application number
PCT/JP1997/003259
Other languages
French (fr)
Japanese (ja)
Inventor
Hirotsugu Kojima
Kenji Kaneko
Toshimitsu Ozawa
Tsukasa Yamauchi
Yukari Katayama
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/JP1997/003259 priority Critical patent/WO1999014685A1/en
Publication of WO1999014685A1 publication Critical patent/WO1999014685A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors

Definitions

  • the present invention relates to a SIMD (Single Instruction Multiple Data) type data processing device capable of performing parallel arithmetic processing, and is applied to, for example, error correction of codes, and further to encoding and decoding of data in storage systems and communication systems. Effective technology. Background art
  • Hard disk drives, CD-ROMs (Compact Disc-Read Only Memory), DVDs (Digital Video Discs), magneto-optical disks, and other recording media can be used to correct recording / reading errors that occur on media.
  • Code words are used.
  • a codeword is defined, for example, by a special set of numbers called the Galois field and special operations defined with it. Code error correction is performed by data processing using the number of Galois fields and arithmetic.
  • a set of numbers in the Galois field can be defined in multiple ways by a primitive polynomial called a primitive polynomial, which defines the set of numbers. At the same time, the operations are defined differently depending on the primitive polynomial.
  • Reed Solomon code which is used particularly for error correction codes in data storage systems and communication systems where errors tend to concentrate locally.
  • Reed-Solomon codes are defined using the number of Galois fields, and encoding and decoding are performed by operations on the Galois field.
  • the primitive polynomial that defines the Galois field is defined for each medium. Since the Galois field and the four arithmetic operations of the number of the Galois field are well known, A detailed description will be omitted.
  • the time allowed for processing of received data including error correction is limited to the data transmission time. It is determined from the reception speed. This is to take into account the real-time processing.
  • the error correction of the code requires a hardwired custom LSI (semiconductor integrated circuit) instead of a general-purpose processor in consideration of the specialty of the four arithmetic operations on the number of Galois fields and real-time performance. Has been used.
  • the arithmetic unit on the Galois field is composed of hardware, and as an LSI that performs error correction in the form of a processor, “Error Correction LSI for Optical Disks” (Transactions of the Institute of Electronics, Information and Communication Engineers A Vol. J73-A NO.2 pp .261-268, February 1990) is known (first known example). According to this, the error correction processing procedure is divided into the following four steps. In step 1, a syndrome polynomial is calculated from the received codeword. In step 2, the error locator polynomial and the error value polynomial are determined from the syndrome polynomial. In step 3, the error location is determined from the error location polynomial.
  • step 4 an error value is obtained from the error locator polynomial and the error value polynomial.
  • the error value polynomial In the above-mentioned known example, it is difficult to perform all of the above steps using hardware in the form of a processor due to restrictions on real-time properties. Processing is performed by the dedicated circuit of the wire.
  • Measures to improve the data processing performance to satisfy the real-time property include a method of increasing the operating frequency of the processor and a method of improving the operation performance by introducing parallel processing.
  • the former operation frequency can be improved by improving the process / circuit technology or adopting multi-stage pipeline processing.
  • process / circuit technology The improvement in performance does not mean that a performance improvement several times at a time can be achieved.
  • multi-stage pipeline processing is introduced into hardware of the processor system, there is a problem that a large amount of overhead is generated in branch processing.
  • the introduction of the parallel processing can relatively easily improve the calculation performance, but has a problem that the processing performance cannot be substantially improved unless the processing algorithm itself is adapted to the parallel processing.
  • a factor that disrupts the parallelism of the processing algorithm is branch processing that depends on the operation result.
  • a “conditional operation control circuit in parallel processing” Japanese Patent Application Laid-Open No. 5-189585)” is known.
  • This is a SIMD-type parallel processor consisting of one instruction supply circuit and a plurality of operation units of the same configuration that execute the same instruction on different data.
  • a flag control circuit is provided for each operation unit. .
  • the flag control circuit inputs an operation result flag indicating the operation result from the corresponding operation unit and outputs an operation condition flag to the corresponding operation unit. According to the operation condition flag, the operation unit performs conditional operation such as updating / holding of the output register.
  • the operation result flag is stored in the shift register provided in the flag control circuit for a plurality of cycles, and the operation result can be reflected in the operation condition flag during that period. . If this technology is used, in the SIMD type parallel processor, each arithmetic unit can realize branch processing depending on the arithmetic result.
  • CD-ROMs are provided with devices whose access speed has been increased from the standard speed to 2x speed, 4x speed, and even 12x speed or more. It is essential to improve the processing speed of the LSI that performs error correction. The demand for higher speed is so urgent that it cannot be met by improving the performance through microfabrication of semiconductors, so an architectural level redesign is required. This requires a huge amount of development man-hours, leading to the problem of increasing development costs.
  • the processing unit (PE0) processes the former as shown in FIG. 25 (a).
  • the latter operation unit (PE1) holds the output register.
  • the two arithmetic units start the processing C in parallel.
  • the condition is not satisfied in both operation units (PE0, PE1), neither operation unit may execute process B. Since the instruction is described by a conditional operation instruction, cycles in which no operation is performed are repeated by the number of execution cycles of the processing B. The same applies to the case where the number of operation units is two or more. If the conditions are not satisfied in all the operation units, the number of cycles of the processing B is wasted. As described above, there is a problem that even if the parallel processing technology disclosed in the second known example is used, the processing performance cannot be sufficiently improved.
  • An object of the present invention is to provide a data processing device capable of improving parallel processing performance in the SIMD format.
  • Another object of the present invention is to improve the error correction processing speed for a codeword. To provide a data processing device capable of performing such operations.
  • Still another object of the present invention is to provide a data processing apparatus which can cope with a wide range of error correction codes and can cope with an improvement in performance by a simple design change.
  • Another object of the present invention is to provide a data processing system capable of accelerating the data reading speed in a storage system and supporting high-speed data transmission in a communication system from the viewpoint of error correction processing.
  • a SIMD data processing device is configured such that a control unit that decodes and executes a fetched instruction and control information for an arithmetic operation are provided in parallel from the control unit, and data transfer is performed by the control unit. And a plurality of data processing units to be controlled.
  • Each of the data processing units includes a standby control unit that sets the data processing unit to a standby state in accordance with a result of an arithmetic operation according to the control information, and activates each data processing unit from the standby state. The control to return to the state is performed by the control unit.
  • the plurality of data processing units are brought into the standby state based on the results of the individual calculations, and the control unit returns from the standby state to the active state, thereby realizing the branch processing.
  • the control unit includes: a detection unit configured to detect whether each data processing unit is in a standby state; and a data processing unit according to a detection result by the detection unit.
  • the control unit monitors and controls the operation state of each data processing unit so that all the data processing units do not enter the standby state and generate a useless cycle. That is, when all the data processing units are in the standby state, the logic unit changes the order of instruction execution by the control unit and returns the data processing unit in the standby state from the standby state to the active state. .
  • the standby control unit includes a determination unit that determines whether a result of an arithmetic operation performed by the data processing unit has reached a specific state, and is set to a set state in synchronization with the detection of the specific state by the determination unit.
  • a standby register that is reset by specific control information from the control unit; and a unit that stops an arithmetic operation by the data processing unit in response to the set state of the standby register. be able to.
  • the means for stopping the arithmetic operation may be configured to selectively inhibit transmission of control information supplied from the control unit to an internal circuit.
  • the data processing unit includes a Galois field multiplication circuit and an addition circuit.
  • the control unit includes a Galois field multiplication instruction and a Galois field addition as operation instructions for controlling the Galois field multiplication circuit and the addition circuit.
  • the data processing device can be configured as a semiconductor integrated circuit by executing at least the instruction and the Galois volume sum operation instruction.
  • the control unit fetches an instruction from the program memory, and executes the data processing unit.
  • Error correction processing The introduction of a SIMD parallel processor to perform code error correction processing increases the number of data processing units without changing the basic architecture or programs to improve the processing speed required by media. By simply increasing the degree of parallelism, you can easily respond. Codes of different standards can be dealt with by changing the program, and systems that assume error correction of multiple standards can be easily handled.
  • a SIMD data processing device that decodes and executes a fetched instruction, and that the control unit receives control information for an arithmetic operation in parallel from the control unit, A plurality of data processing units controlled to transfer data by the control unit, and storage means accessed by the control unit, wherein each of the data processing units includes a first arithmetic circuit, Buffer means connected to one arithmetic circuit, and a plurality of bus means for designating the address of the buffer means so as to be changeable, wherein the buffer means of each data processing unit is connected via a data bus. It is connected to the storage means.
  • the buffer means provided for each data processing unit is used as a work area for temporarily storing data during the operation and the like. Therefore, each buffer unit operated in parallel can suppress overhead caused by data transfer as compared with the case where the storage unit is also used as a work area, and improve parallel processing performance. Can be.
  • the control unit should reduce the number of bits of the operand designation field included in the instruction description for controlling access to the buffer means. Can be. For example, when specifying an addition operation that substitutes A + B for C in one instruction, the source address of A and B and the destination address of C must be described in the operand specification field. No. At this time, if the identification information of the above-mentioned buffer means is described and used in the operand specification field, the number of bits of the operand specification field can be reduced as compared with the case where the address of the buffer means is directly described in the instruction. It can be reduced. This can contribute to a reduction in instruction word length when a high-performance operation is defined by one instruction.
  • the setting of the address information for the above-mentioned boyfriend means may be performed by the control unit executing a click instruction or the like. Further, each of the data processing units may further include a second calculation unit used for updating address information set in the binding unit by the control unit. As a result, the number of data transfers from the control unit to the data processing unit can be reduced.
  • the control unit includes an instruction execution unit that executes an operation instruction that specifies a parallel operation in the data processing unit and a data transfer instruction that specifies a data transfer to the data processing unit.
  • the instruction execution means can execute the operation instruction and the data transfer instruction in parallel. That is, the data processing device supports a compound instruction in which an operation instruction and a data transfer instruction are combined. In the parallel processing of the SIMD format, it is supposed that the data transfer capability may be insufficient compared to the computing capability, so that it is possible to deal with this.
  • the instruction execution means includes a single instruction included in the operation instruction, and a buffer instruction designated by the pointer means.
  • the data obtained from the above is calculated, the calculation result is stored in the buffer means specified by the different boy means, and an instruction for updating the contents of the above boy data is given by the calculation.
  • the instruction execution means further executes an instruction for operating data inside the control unit, and a branch instruction for branching an instruction fetched by the control unit. Can be made possible.
  • a SIMD data processing device focuses on an instruction executed by the control unit for the above-described standby state control. That is, the data processing device includes the control unit described above and a plurality of data processing units, each of the data processing units includes the standby control unit, and the control unit includes a data processing unit. There is means for referring to whether or not the data processing unit is in the standby state, and the data processing unit is returned from the standby state to the active state according to the reference result.
  • the control unit sets a condition for setting the data processing unit in a standby state in the data processing unit, and executes an instruction for setting the data processing unit in which the set condition is satisfied to a standby state when set. can do.
  • the control unit sets a condition for setting the data processing unit in a standby state in the data processing unit, and waits for the data processing unit in which the set condition is satisfied in an instruction execution cycle after the condition setting. Instructions can be executed to bring the state. Further, the control unit may execute a command for giving an instruction to individually set the plurality of data processing units to a standby state or to return to the active state from the standby state.
  • a SIMD data processing device is intended to increase the efficiency of repetitive processing by parallel arithmetic processing. That is, the data processing apparatus has the above-described control unit, a plurality of data processing units, and storage means accessed by the control unit, and each of the data processing units stores the control information And a standby control unit that sets the data processing unit to a standby state according to a result of an arithmetic operation according to the following.
  • the standby control unit changes the inside of the data processing unit from a standby state to an active state according to an instruction from the control unit. To return to At this time, the control unit operates when the respective data processing units are in a standby state.
  • Detecting means for detecting whether or not the data processing unit is present, and logic means for returning the data processing unit from a standby state to an active state in accordance with a detection result by the detecting means.
  • the data processing unit is operated in parallel according to the instructions from the start address to the end address and up to the set number of repetitions. Let me do it.
  • the rebeat command may be a command that specifies the start address of the repeat loop, the end address of the repeat loop, the number of times the repeat loop is repeated, and the condition for forcibly terminating the repeat loop.
  • the control unit when executing the repeat instruction, repeatedly operates the data processing unit in parallel according to the instruction from the start address to the end address unless the forced termination condition is satisfied.
  • the standby state of all data processing units can be set as the condition for forcibly terminating the repetitive loop.
  • a standby state of at least one data processing unit can be set as a condition for forcibly terminating the repetition loop.
  • the data processing unit includes a Galois field multiplication circuit and an addition circuit
  • the control unit includes a Galois field multiplication circuit and an addition circuit for controlling the Galois field multiplication circuit and the addition circuit. It is possible to cause at least a Galois field multiplication instruction, a Galois field addition instruction, and a Galois volume sum operation instruction to be executed.
  • the control unit fetches an instruction from the program memory and uses the data processing unit. It can be configured to perform error correction processing.
  • the syndrome calculation processing of the code decoder defined by the number of Galois fields, the determination processing of the presence or absence of an error using the syndrome obtained by the syndrome calculation processing, and the syndrome in which the error is determined are performed.
  • the process of storing the ROHM in the storage unit may be repeated a plurality of times, and thereafter, the stored syndrome may be read from the storage unit to perform an error correction operation process.
  • the syndrome operation, the determination of the presence or absence of an error, and the error correction operation are performed as a single iteration loop and the parallel operation is performed, if there is no error in some data processing units, the data The overnight processing unit must maintain a standby state until one loop processing is completed, and it is expected that useless cycles will frequently occur.
  • the processing procedure of the present invention when an error has occurred, the syndrome is stored, and when the stored syndrome has accumulated to a certain extent, the correction operation processing is collectively performed on a plurality of syndromes. Do. Therefore, the standby state of the data processing unit can be shortened as a whole, and unnecessary cycles due to the standby state can be reduced.
  • the number of processes for storing the syndrome for which an error has been determined increases, and conversely the amount of processing may increase. That is, whether or not the processing for storing the syndrome is executed is determined by whether or not an error has occurred as described above. Therefore, if the number of errors increases, the number of syndrome storage processes increases, and the processing amount may increase.
  • a data processing system to which the data processing device is applied includes, for example, an input unit for code data defined using the number of Galois fields, the data processing device, and a data output unit.
  • the device corrects the error of the code data input from the input means, based on a program stored in the program memory. This enables error correction From the viewpoint of processing, etc., it is possible to respond to high data reading speed in the storage system and high speed data transmission in the communication system.
  • the control unit included in the data processing device can execute the input / output control by the input unit and the output unit and the error correction process of the code data in a time sharing manner.
  • FIG. 1 is a block diagram of a SIMD parallel processor according to a first embodiment of the present invention
  • FIG. 2 is a block diagram of a SIMD parallel processor according to a second embodiment of the present invention.
  • FIG. 3 is a block diagram of a SIMD parallel processor according to a third embodiment of the present invention.
  • FIG. 4 is a block diagram of a SIMD parallel processor according to a fourth embodiment of the present invention.
  • FIG. 5 is a flowchart showing a typical processing procedure for error correction of a lead Solomon code
  • FIG. 6 is a flowchart showing an example of a processing procedure for performing error correction of a code by the SIMD parallel processor of the present invention
  • FIG. 7 is a flowchart showing an example of a more efficient processing procedure for performing error correction of a code by the SIMD parallel processor of the present invention.
  • FIG. 8 is a logic circuit diagram showing an example of a condition determining unit
  • FIG. 9 is a logic circuit diagram showing an example of a comparator with a mask.
  • FIG. 10 is a block diagram showing an example of a data processing unit
  • FIG. 11 is a block diagram showing an example of a data processing unit in which the supply of a clock signal is stopped in the data processing unit of FIG. 10 to realize a standby state.
  • FIG. 12 is a block diagram showing a detailed example of a data processing unit suitable for error correction processing,
  • FIG. 13 is an explanatory diagram of a special control instruction of the data processing device according to the present invention
  • FIG. 14 is an explanatory diagram of a data transfer instruction of the data processing device of the present invention
  • FIG. FIG. 16 is an explanatory diagram of a SIMD instruction of such a data processing device.
  • FIG. 16 is an explanatory diagram showing the configuration of all instruction codes of a general RISC instruction and a compound instruction combining a data transfer instruction and a SIMD instruction.
  • FIG. 18 is an explanatory diagram showing an example of an instruction code of a transfer instruction.
  • FIG. 18 is an explanatory diagram showing an example of an instruction code of a SIMD instruction.
  • Fig. 19 is a block diagram of a SIMD-type parallel processor for explaining the configuration for putting the data processing unit into a standby state by using the data setting instruction.
  • Fig. 20 executes "setPENOP2 1 if”. Operation timing chart,
  • FIG. 22 is an explanatory diagram showing an example of a program created using the instruction set shown in FIGS. 13 to 15;
  • Figure 23 is a block diagram of an example of a DVD / CD-ROM system to which a SIMD type parallel processor is applied.
  • FIG. 24 is a flowchart showing general conditional branch processing.
  • FIG. 25 is a flowchart showing an example of the conditional branching process by the SIMD type parallel processor discussed earlier by the present inventors. BEST MODE FOR CARRYING OUT THE INVENTION
  • FIG. 1 is a block diagram of a SIMD parallel processor according to a first embodiment of the present invention.
  • the SIMD parallel processor shown in FIG. First and second memories 801 and 802, a peripheral circuit 900 connected via a bus interface (bus I / F) 901, a controller 200, and a plurality of parallel data processors 101 and 102. , ... 10n.
  • CD B and C AB shown in the figure are a common data bus and a common address bus.
  • L DB and LAB are local data buses and oral address buses.
  • the SIMD type parallel processor shown in the figure can be formed on one semiconductor substrate including all or a part of the peripheral circuit except for the peripheral circuit. Note that the memory, in particular, a part or all of the first memory may not be formed on the single semiconductor substrate.
  • the first memory 801 is used for storing programs and data to be fetched and executed by the control unit 200.
  • the control unit 200 supplies an address to the first memory 801 to read or write an instruction or data.
  • the second memory 802 is used for storing data overnight, and data is input / output to / from the data processing units 101, 102,..., 10n, etc., which are arithmetically controlled by the control unit 200.
  • the target data is supplied to the data processing units 101, 102,... 10n, and is used for storing the results of the calculation.
  • the second memory 802 is also accessible from the control unit 200.
  • the program may be stored in the second memory 802 and transferred to the control unit 200 via the bus. Not only in this embodiment, but in all embodiments disclosed in the present invention, if the first memory 801 and the second memory 802 are allocated to different address areas in the common address space, the program and the It is preferable because data can be accessed in the same manner.
  • the control unit 200 includes a program counter (PC) 202, an instruction decoding unit 201, and a data operation unit 203, and is configured to determine a data operation result by the data operation unit 203 and set a flag 204. I have. Outputs the value of program count 202 as address to first memory 801 To fetch the instruction. The fetched instruction is decoded by the instruction decoding unit 201, and a control signal is issued to the data processing units 101, 102,... 10n according to the processing result. In addition, in response to the control signal output from the instruction decoding unit 201, the data operation unit 203 updates the program counter 202 according to the instruction, performs memory access, and performs data operation. As the address of the second memory 802, the result of the operation by the data operation unit 203 of the control unit 200 is used.
  • PC program counter
  • the data operation unit 203 of the control unit 200 performs program loop control, operation not suitable for parallel operation, memory address operation, and the like.
  • the operation result is fed back to the instruction decoding unit 201 via the flag 204 and the program is executed. The flow is being controlled. Further, the operation progress and results of the data processing units 101, 102,... 10n can be fed back to the instruction decoding unit 201.
  • the data processing units 101, 102,..., 10n all have the same configuration, are supplied with the same control signal from the control unit 200, and perform the same operation in principle. Data is mainly supplied from the second memory 802, but can also be transferred from the first memory 801 and the peripheral circuit 900 via the bus interface circuit 901 and from the data arithmetic circuit 203 of the control unit 200. .
  • data transfer can be performed from the data processing units 101, 102,... 10n to the first memory 801 or the second memory 802, the peripheral circuit 900, and the data operation circuit 203 of the control unit 200.
  • the memory can read 32 bits of data at once.
  • the configuration is such that data of every 8 bits can be supplied collectively to the data processing units 101, 102,... 10 ⁇ .
  • Each of the data processing units 101, 102,..., 10 ⁇ is provided with a data processing unit 101, in accordance with a calculation result by the calculation units 161, 162,... 16 ⁇ according to the control information (control signal / command) from the control unit 200.
  • Standby control to put 102, ... 10 ⁇ in standby state As means, for example, the judgment units lll, 112, ... lln, the standby registers 121, 122, ... 12 ⁇ , the standby / active switching units (enb / dis) 131, 132, ... 13 ⁇ Yes.
  • the standby control unit returns the inside of the data processing units 101, 102,... 10 ⁇ from the standby state to the active state according to an instruction from the control unit 200.
  • the standby control means includes: a determination unit 111 (112,... 11 ⁇ ) for determining whether a calculation result by the data processing units 101, 102,.
  • the standby register is set to a set state (first state) in synchronization with the detection of the specific state by the control unit, and is set to a reset state (second state) by specific control information from the control unit 200.
  • 121 122,... 12 ⁇
  • a standby / active switching unit 131 132, 132 for stopping the arithmetic operation by the data processing unit 101 (102,. ... 13 ⁇ ).
  • the standby registers 121, 122,... 12 ⁇ are independently read from the control unit 200, and can be set / reset.
  • the control section 200 can monitor the status of the registers 121, 122,.
  • FIG. 2 shows an SIMD type parallel processor according to a second embodiment of the present invention. Circuit blocks having the same functions as those shown in FIG. 1 are denoted by the same reference numerals.
  • SI MD type parallel processor While the computing power is enhanced by operating the data processing units in parallel, the problem that the transfer speed of data processing determines the processing performance of the processor often occurs. For example, when the intermediate processing results in the respective data processing units 101, 102,..., 10 ⁇ are stored in the memory 802 each time, the number of parallel data input / output bits of the memory 802 is equal to all the data processing units 101, 102,. If the number of bits that can be used for data input / output in parallel with 102,... 10 ⁇ is not enough, the memory access must be divided into multiple times.
  • the storage means 141, 142 which store a plurality of data (the number of data bits used as a data processing unit) in each of the data processing units 101, 102,. ... 14 ⁇ is provided.
  • the data storage means 141, 142,..., 14 ⁇ are used as so-called peak areas of the individual data processing units, and in ordinary applications, several tens of words are required. Would require 5 to 7 bits of instruction code per code.
  • a plurality of boys 151, 152,... 15 ⁇ for specifying the storage areas of the storage means 141, 142,.
  • the code specified by ..15 ⁇ is used as the operand of the instruction.
  • the instruction code can be specified in 2 bits if the number of buses is up to four.
  • the values of the pointers 151, 152,... 15 ⁇ are designed to be updated at the same time as an operation instruction, a data transfer instruction, and the like. Pointers 151, 152, ... 15 ⁇ Increment / decrement evenings are set for each Add control to instruct update of 151,152, ... 15 ⁇ .
  • a mechanism is provided for detecting when the value of the pointers 151, 152,... 15 ⁇ reaches a predetermined value. Registers that hold the end values of pointers 151, 152, ... 15 ⁇ are provided, and each time the values of pointers 151, 152, ... 15 ⁇ are updated, they are compared with the end values, and the comparison result is reflected in flag 204. I do.
  • FIG. 3 shows a SIMD type parallel processor according to a third embodiment of the present invention.
  • This embodiment is a combination of the first and second embodiments. Circuit blocks having the same functions as those shown in FIGS. 1 and 2 are denoted by the same reference numerals. It is attached.
  • Judgment units 171, 172,... 17 ⁇ in addition to the transition conditions are adopted.
  • This is a memory Means 141, 142,... 14n, and is suitable for performing processing until processing for all data groups in which the total number of data stored therein is different or unknown is completed. . That is, if the number of data sets in the data set is different for each of the data processing units 101, 102,... 10 ⁇ , the data processing units that have completed processing sequentially enter a standby state. The processing can be performed while maintaining.
  • FIG. 4 shows a SIMD parallel processor according to a fourth embodiment, which is a further preferred embodiment of the third embodiment.
  • a condition determining unit 206 for determining the standby state of each of the data processing units 101, 102,... 10 ⁇ is explicitly shown in the control unit 200, and the control unit 200 A repeat control unit 205 that uses the determination result of the unit 206 is provided.
  • the rest of the configuration is the same as the embodiment of FIG.
  • the standby state of each of the data processing units 101, 102,... 10 ⁇ is monitored by the condition determination unit 206 using a signal PEN0PEA.
  • the standby state of each of the data processing units 101, 102,... 10 ⁇ becomes a predetermined state, information indicating the state is reflected on the flag 204, and is given to the repeat control unit 205.
  • the repeat control circuit 205 includes hardware for holding a program repetition start address, a program repetition end address, and the number of repetitions. When a repeat command for giving a value to these is issued, an overflow for repetition is performed. Perform a repeat loop without any
  • the repeat control unit 205 includes a register for storing a start address of a repetitive loop, a register for storing an end address, and a register for storing the number of repetitions.
  • a counting means for counting the number of repetitions is provided, and a repetition loop (repeat loop) for executing a command from the start address to the end address through the program counter 202 is formed. Lipi —The number of repetitions of the loop is defined by the set number of repetitions.
  • the repeat loop is forcibly terminated when the condition is satisfied.
  • the condition for forced termination of the repetitive loop processing it is possible to use the detection result of the condition determination unit 206 as to whether the standby state of the data processing units 101, 102,... 10 ⁇ satisfies a preset condition. it can.
  • the control unit 200 detects it and can forcibly terminate the program branch or the retry loop, so that there is no waste in the processing steps. There is.
  • the data processing apparatus having the configuration disclosed in the first to fourth embodiments includes an error correction code, particularly a Reed-Solomon code, by providing a Galois field arithmetic unit in the data processing units 101, 102,. This is optimal for error correction processing.
  • the error correction processing flow consists of data transfer 1001, syndrome calculation 1002, error determination 1003, Euclidean algorithm 1004, Chien search 1005, error numerical calculation 1006, and correction 1007.
  • the data transfer 1001 is a data transfer from the first memory 801 to the second memory 802.
  • the data is arranged in the second memory 802 in a format suitable for SIMD-type parallel processing.
  • the syndrome calculation 1002 receives a series of received codes (from r O to r 255) 2001 as input and calculates a coefficient 2002 of a syndrome polynomial.
  • the series of received codes (r0 to r255) is, for example, 256 bytes of data, and includes encoded data and parity information corresponding thereto.
  • the syndrome calculation is performed in units of the series of received codes (r0 to r255). If the coefficients 2002 of the syndrome polynomial are all zero, it is understood that there is no error in the received code. If it is found that there is no error, omit the following processing and terminate. If it is found that there is an error, start the correction processing.
  • an error locator polynomial 2003 and an error numerical polynomial 2004 are calculated from the syndrome polynomial 2002 by the Euclidean algorithm 1004.
  • the location of the error 2005 is obtained by obtaining the root of the error locator polynomial 2003 by the Chien search 1005.
  • the error position 2005 is obtained with a value that cannot actually be obtained, it is understood that an error has occurred that exceeds the code correction capability.
  • it outputs that the correction is not possible, and skips the following processing and ends. If the position of the error is properly obtained, the error value 2006 is calculated based on the position, the correction 1007 is performed, and the processing is terminated.
  • the biggest advantage of using a SIMD parallel processor configuration is data processing. By changing the number of copies, the processing performance can be changed without changing the program.
  • the program can be the same as long as the code standard is not changed, and high-speed access can be handled by increasing the number of data processing units, making design changes extremely easy.
  • FIGS. 6 and 7 schematically show a processing flow when the four reception code sequences 2001 are processed in parallel.
  • GPEO (lOl) to GPE3 (104) show four parallel data processing units, and in the vertical direction, each of the data processing units 101, 102, 103, 104 The processing to be performed is shown.
  • FIG. 6 if an error is detected in at least one of the four input received code sequences 2001, the data processing unit in which no error is detected is set to the standby state 1099, and the time until the correction is completed. Execute the process.
  • Fig. 6 shows that the received code sequence 2001 input to GPEO (lOl) has an error within the correctable range, and the received code sequence 2001 input to GPE 102) has an error beyond the correctable range. Is generated, and no error occurs in the received code sequence 2001 input to the GPE2 (103) and the GPE3U04). All data processing units 101, 102, 103, 104 are unconditionally operated in parallel until data transfer 1001, syndrome calculation 1002, and error determination 1003.
  • GPE0 (101) and GPE102 which were found to have errors, perform subsequent error correction processing, while GPE2 (103) and GPE3 (104) detect that no errors were found.
  • GPE2 (103) and GPE3U04) will be in the waiting state 1099 during that time.
  • GPE1 (102) performed the following error numerical calculation 1006 and correction 1007. Wait for the GPE0 (101) to end in the standby state 1099. In this case, if all the data processing units 101, 102, 103, and 104 detect that there is no error, the processing can proceed to the next series of received codewords 2001 without performing error correction processing.
  • Wasteful processing steps can be avoided. However, if at least one of the four received code sequences 2001 simultaneously processed by the four data processing units 101, 102, 103, and 104 has an error, one data for which error correction is actually performed is performed.
  • the three data processing units other than the overnight processing unit enter the standby state 1099 in all the processing after the Euclidean algorithm 1004 and wait for the completion of the error correction processing. In this case, it cannot be said that the data processing units 101, 102, 103 and 104 provided in parallel are effectively used. As the number of data processing units increases, the probability that no error is detected in all data processing units decreases, and processing efficiency decreases.
  • the processing flow shown in FIG. 7 is effective.
  • the four received code sequences 2001 are read into the four data processing units 101, 102, 103, and 104 (1001), and the syndrome calculation 1002 and the error determination 1003 are performed.
  • the processing flow shown in Fig. 6 In GPEO (lOl) and GPE1 U02) found to be incorrect, the calculated syndrome 2002 is stored in the memory 802, and GPE2 (103) and GPE3 (104) found to be free of errors are in the standby state. It becomes 1099 and waits for the end of the processing.
  • the processing flow returns to the first data transfer, and the next four received code sequences 2001 are read into the four data processing units 101, 102, 103, and 104. Further, the syndrome calculation 1002 and the determination of the presence or absence of an error 1003 Similarly, only the syndrome 2002 of the received code sequence 2001 in which an error is detected is stored 1008 in the memory 802. After this is performed for a certain number of received code sequences 2001, the syndrome 2002 stored in the memory 802 is read only for the received code sequence 2001 in which an error is detected (1009). , Perform the following error correction processing. Again, the number of errors is If the error exceeds the correction capability, it is impossible to correct the error. In the example shown in FIG.
  • the process of temporarily storing the syndrome 2002 of the received code sequence 2001 in which the error was detected in the memory 802 (1008) and reading it out again (1009) is the same as the example shown in FIG. It will be needed more than in.
  • the processing flow shown in Fig. 6 is efficient when the frequency of errors is high
  • the processing flow shown in Fig. 7 is efficient when the frequency of errors is low. From the relationship between the frequency of occurrence of errors and the number of processing steps required for error correction and the number of processing steps required to temporarily store the syndrome in memory, it is possible to quantitatively determine which processing flow is more efficient.
  • the average value of the frequency of occurrence of errors is extremely lower than the frequency of occurrence of the most supposed errors. If the received code is 256 bytes and eight or less errors are corrected, 16 syndromes are required. Since the probability of an error occurring is typically 1 in 1000, an average of about four codewords will have one word error.
  • the number of processing steps when error correction is performed according to the processing flow in FIG. 7 is estimated.
  • the processing is completed in less than half of the processing flow shown in FIG. If the error rate is lower, or if the parallelism is increased by increasing the number of processing units, the effect of reducing the number of processing steps is more remarkable when the processing flow in Fig. 7 is adopted. become.
  • a program for implementing the processing procedure of FIG. 7 is stored in the memory 801.
  • the control program for the S I MD processor can also be stored in external memory.
  • the control of the repetition loop can be applied to the processing as shown in FIG. That is, the process from the overnight transfer (1001) to the storage of the syndrome (1008) is repeated a plurality of times, and thereafter, the process branches to the reading process of the syndrome (1009).
  • Each processing in the error correction processing flow shown in FIG. 5 includes many repetitive loops as in general signal processing.
  • Repeat loops are classified into two types from the viewpoint of controlling hardware.
  • One is an iterative loop that is configured with software, using the general-purpose register provided in the data processing unit of the control unit as a counter.
  • the other type is a register that stores the start address in the control unit, and stores the end address.
  • This is a repetition loop in which a control evening is set up, and a countdown is provided to count the number of repetitions.
  • This hardware is the repeat control unit 205.
  • An iterative loop composed of software has no restriction on the nest depth of the loop, and in particular does not have a repeat control unit 205, so that the circuit scale can be saved, but it is necessary to issue control instructions. The number of processing cycles increases.
  • the repetition loop control by the repeat control unit 205 does not require the number of processing cycles for control because the control of the number of repetitions and the like are all performed by hardware, but there are restrictions such as limitation of the loop nest. . Generally, when there are multiple nests, the innermost loop is composed of a repeat loop.
  • the repetition loop is classified into three types from the viewpoint of the processing flow to be realized.
  • the number of repetitions is a fixed value, the number of repetitions is already known in the processing up to that point, and the one that satisfies a certain condition during repetition, suspends repetition, performs another processing, and then resumes.
  • the control method of the iterative loop described above can be applied to any of the three types of processing flows when the data path is single. However, it is not always easily applicable to general SIMD-type parallel processors.
  • the first iteration loop with a fixed number of iterations can be realized without any problem by a conventional SIMD-type parallel processor.
  • each of the data processing units 101, 102,... 10n independently detects that the number of repetitions determined in the middle of the repetition has ended, and enters a standby state. Since the data processing units 101, 102,... 10 ⁇ sequentially enter the standby state in ascending order of the required number of repetitions, the control unit 200 may detect that all of the data processing units 101, 102,... 10n have entered the standby state and terminate the loop repeatedly.
  • the repetition loop that satisfies a condition during the third repetition, temporarily suspends repetition, performs another process, and then resumes processing, is a timing that satisfies the condition for each of the data processing units 101, 102, ... 10n May be different, which is difficult to achieve with conventional SIMD type parallel processors.
  • the data processing units 101, 102,..., 10n satisfying the conditions individually enter a standby state.
  • a repeat loop is often used as the innermost loop, so adding extra cycles can have a fatal effect on overall performance.
  • the control unit 200 detects that at least one of the data processing units is in the standby state, temporarily suspends the loop repeatedly, performs another process, and then resumes the loop. While another process is being performed, the data processing unit that satisfies the condition is returned from the standby state, and the data processing unit that does not satisfy the condition is set in the standby state by a signal from the control unit 200.
  • the control unit 200 may perform processing independently using an internal data calculation unit.
  • the control unit 200 When controlling the above-described repetition loop in the circuit configuration of the embodiment shown in FIGS. 1 to 3, the control unit 200 does not include the repeat control means, so the data processing units 101, 102,.
  • the standby / active state is performed by transferring the contents of the standby registers 121, 122,... 12 ⁇ provided in the individual data processing units 101, 102,.
  • the contents of the standby registers 121, 122, ... 12 ⁇ are transferred to the data processing unit 203 of the control unit 200, and all the data processing units 101, 102, ... 10 ⁇ are in the standby state or at least 1
  • the above-described repetition loop is controlled based on a state such as that the data processing units are in a standby state. Transfer of contents of 121,122, ...
  • the contents of the standby registers 121, 122,... 12 ⁇ provided in the individual data processing units 101, 102,... 10 ⁇ are transmitted to the control unit 200 via a dedicated signal line PEN0PEA. If there are ⁇ data processing units 101, 102, ... 10 ⁇ , the signal line PEN0PEA has ⁇ bits.
  • the control unit 200 includes a condition determination unit 206 that detects that the standby / active state of the data processing units 101, 102, ..., 10 ⁇ satisfies a preset condition, and a determination result by the flag is stored in a flag inside the control unit 200.
  • the control signal is fed back to 204 or a control signal for temporarily stopping the repeat loop or forcibly terminating the repeat loop to the repeat control unit 205.
  • Feedback to flag 204 is effective when the repetitive loop is controlled by software, and controls the loop such as continuing, suspending, or terminating by checking the status of flag 204. Can be.
  • the repeat control circuit 205 Since the repeat control circuit 205 is a hardware control circuit originally provided to save an instruction execution cycle for controlling the loop, the repeat control circuit 205 monitors the standby / active state of the data processing units 101, 102,... 10 ⁇ . Therefore, it is thought that there are few uses that consume a lot of instruction execution cycles. In many cases, a control signal is sent to the repeat control unit 205 when a preset condition is satisfied, and the repeat loop is forcibly terminated.
  • FIG. 8 shows an embodiment of the condition determining unit 206.
  • a signal ⁇ 0 ⁇ [3: 0] representing the standby / active state is input from each data processing unit.
  • the signal In PENOPEA [3: 0] one data processing unit outputs 1 bit, for a total of 4 bits. This signal is "1" when the corresponding data processing unit is in the standby state.
  • the standby register is 1 bit
  • a plurality of bits may be provided.
  • the data processing unit that corrects the received codeword for which no error was detected sets the upper bit to 1 and enters the standby state.
  • more efficient control is possible by using the lower 1 bit in the standby state to terminate the repetition loop earlier. Become.
  • FIG. 10 shows an example of the data processing units 101, 102,... 10n.
  • the integer register 70, buffer 50, and register 60 are connected to the bus LD #. Data transfer between the memories 801 and 802, the control unit 200, and the peripheral circuit 900 is performed through this bus LD #.
  • the buffer 50 is an example of the storage means 141, 142,... 14 ⁇ , and stores data of several tens of words, of which several words are designated in parallel by the pointer 20, and the operation data is stored. Used for overnight transfer.
  • Pointer 20 is pointer 151, 152,... 15n are examples.
  • An independent register 60 is also used for the transfer of the calculation data.
  • the buffer 50, the register 60, and the arithmetic circuit 10 are connected via an internal bus, and data transfer between the buffer 50 and the register 60 is also performed. It is executed via this internal bus.
  • the arithmetic circuit 10 includes a plurality of arithmetic units, and reflects the arithmetic result on the flag register 40.
  • the value of the pointer 20 gives the address of the buffer 50, and the value can be updated in parallel with the operation and transfer processing by the operation instruction and the data transfer instruction. In the error correction, when the coefficients of the polynomial are stored in the buffer 50 and the operations of the polynomials are performed, it is efficient to perform the operations while sequentially updating the pointer 20.
  • the integer register 70 is connected to the integer arithmetic unit 80 via another internal bus and performs an address operation.
  • the integer register 70 does not need to perform only the address operation of the buffer 50, but may perform a completely different integer operation.
  • the error correction program counts the number of syndromes with a value of zero, and is used to detect that all the syndrome values are zero, that is, there is no error in the received codeword being processed.
  • the operation result of the integer operation unit 80 is also reflected in the flag register 40.
  • the integer operation unit 80 is also included in the operation units 161, 162,... 16 ⁇ of FIG. Flag Regis Evening 40 It is compared with the standby condition supplied from the control unit 200, and when the condition is satisfied, the standby register 42 is set to enter the standby state.
  • the register 42 corresponds to the standby registers 121, 122,... 12n.
  • the circuit 43 for disabling the control signal when in the standby state controls the input control signal to an inactive state so that operations such as arithmetic are not performed.
  • the standby register 42 can be changed by a standby state change signal from the control unit 200.
  • the circuit 43 corresponds to the circuits 131, 132,..., 13 ⁇ in FIG. In FIG. 11, a flag 40 and a comparator 41 are included in the determination units 171, 172,... 17 ⁇ in FIG.
  • FIG. 11 shows a configuration for stopping the clock signal. In the standby state, a clock signal is supplied to the standby register 42 so that a reset operation can be performed, and the supply of the clock signal to other circuits is suppressed by the circuit 43.
  • the arithmetic circuit 10 is suitable for image processing and the like if it has a fixed-point arithmetic unit, and is suitable for computer graphics and the like if it has a floating-point arithmetic unit.
  • FIG. 12 shows a detailed example of a data processing unit suitable for error correction processing.
  • the data processing unit shown in the figure is composed of two Galois field multipliers 11, one Galois field adder 12, 6 4 words Galois buffer 50, 4 words Galois register 60, and 3 pointers ( PS1, PS2, PD) 21,22,23,8 ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 70 70 70 70 ⁇ ⁇ ⁇ ⁇ 70 ⁇ 70 70 ⁇ 70 70 ⁇ 70 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ FORM.
  • Galois buffer 50, Galois register 60, two Galois field multipliers 11, and one Galois field adder 12 are connected by six internal buses, and are stored in Galois buffer 50 and Galois register 60. The calculation is performed overnight, and the result is stored in the Galois buffer 50 and the Galois register 60 again.
  • the selectors 14 are provided at the inputs of the calculators 11 and 12, and the internal bus from which the data used for the calculation is output can be selected.
  • the outputs of the arithmetic units 11 and 12 are stored in the Galois buffer 50 or Galois register 60 via the internal bus.
  • the Galois buffer 50 simultaneously outputs the address of the address specified by the pointers (PS1) 21 and (PS2) 22 to the internal bus, and simultaneously outputs the address specified by the pointer (PD) 23 to the internal bus.
  • Import data from The values of pointers 21, 22, and 23 can be increased or decreased (+ 1 / -1) in the same cycle as Galois field arithmetic.
  • the pointer value is written and read from the integer register 70.
  • the integer register 70 stores data required to calculate the values required for controlling the pointers 21, 22, and 23, and the calculation is performed using the connected integer adder / subtractor 80. Will be
  • a predetermined value is stored in advance in the register (PEND) 31, and a pointer to be monitored is designated by the control signal SELPEND to perform monitoring. It is determined by the comparator (CMP) 33 whether or not the value of the present evening coincides with the value preset in the resist evening (PEND) 31. When the judgment results match, the RPTEND flag of the flag register 40 is set by the signal RPTEND, and the standby register 42 is set. When not monitoring the boys 21, 22, and 23, use the control signal SELPEND to set not to compare the values of pointers 21, 22, and 23 with the value of the register (PEND) 31. .
  • the flag register 40 separately provided in the data processing unit has a GZ flag indicating that the result of the Galois field adder 12 has become zero, and the result of the integer arithmetic unit 80 has become zero.
  • GZ flag indicating that the result of the Galois field adder 12 has become zero
  • INEG flag indicates that the condition has become negative.
  • the content of the flag register 40 is compared with the signal N0PCNDX by the comparator 41 with a mask using the signal CNDXMASK as a mask, and if they match, the standby register 42 is set.
  • the standby register 42 is, for example, a 1-bit register, and can be set by the operation result in the data processing unit by the above two methods, or can be reset by writing a value directly from the outside with the signal PEN0PIN. Also, it can be read directly to the outside as signal PEN0PEA.
  • the control signal input to the data processing unit is controlled by the circuit 43 so that all signals except those for controlling access to the standby register 42 are invalidated. ing.
  • FIG. 9 shows an example of the comparator 41 with the mask. 4-bit The flag is compared with the preset 4-bit condition signal N0PCNDX for each bit, but the bits for which '1' is specified in the signal CNDXMASK are excluded from the comparison.
  • the signals NOPCNDX and CNDXMASK are signals that are commonly supplied from the control unit 200 to each data processing unit.
  • FIGS. 13, 14 and 15 show examples of instructions executed by the SIMD type parallel processor of FIG. 4 having the data processing unit shown in FIG. 12. .
  • a reduced instruction set computer (RISC) instruction is preferable in terms of shortening the instruction word length or increasing the code efficiency.
  • the R ID SC architecture is adopted for the SIMD type parallel processor of this embodiment.
  • the instruction shown in FIG. 13 is an example of an instruction which is particularly suitable for implementing the present invention, and is an instruction newly added to a general RISC instruction. In addition to general RISC instructions, it has instructions that can describe RISC data transfer instructions and SIMD instructions in parallel.
  • the data transfer instruction shown in Fig. 14 and the SIMD instruction shown in Fig. 15 can be combined and described in parallel without any restrictions, and can be executed in parallel.
  • the RISC instruction has a new setting instruction and a repeat instruction.
  • the data setting instruction sets the condition data for monitoring the status of the flag (the flag in the flag register 40) and the status data for changing the standby / active status of the data processing unit.
  • the processing section is set to a standby state.
  • means an active state. When a certain process is completed and all the data processing units are in the standby state, all the data processing units are used to proceed to the next process or when a predetermined part of the data processing units is returned to the active state.
  • the control unit 200 has two registers (N0PCNDX) 207 and (CNDXMASK) 208 in which signals NOPCNDX and CNDX ASK are set. These registers 207 and 208 can be written with instructions. The outputs of the register registers 207 and 208 can be supplied to the result decision circuits 171 172,... 17 ⁇ of the respective data processing sections 101, 102,. In the result judgment circuits 171, 172,...
  • the operation result flag is compared with the signal NOPCNDX only for bits not masked by the signal CNDXMASK, and if they match, the corresponding data processing unit is put into a standby state.
  • the value of the flag is determined, and it is determined whether or not the value matches the given condition. If they match, write PEN0P20 to stop the clock. The flag value is not determined without this instruction.
  • the repeat instruction shown in Fig. 13 includes a normal repeat instruction and a repeat instruction with a forced termination condition.
  • Normal repeat instruction “REPEAT RS, RE, RCj is an instruction that repeats the instruction from address RS to address RE RC times, and hardware control is performed by the beat control circuit 205.
  • the repeat instruction with forced termination condition "REPEAT RS, RE, RC, and until condition" does not require an overhead cycle, and the instruction from address RS to address RE is repeated RC times as above. However, if the condition is satisfied, the repeat loop is forcibly terminated.
  • the condition for forced termination is that all unmasked data processing units enter the standby state.
  • PENOPAND s or at least one unmasked data processing unit can be set to the standby state (PEN0P0R). It is effective to use PEN0PAND as the forced termination condition when the number of repetitions differs for each data processing unit, and to use PEN0P0R as the forced termination condition when the search is repeated until a certain data is found. It is.
  • FIG. 14 shows an example of a data transfer instruction.
  • the general-purpose register provided in the data calculation unit 203 of the control unit 200 is used as an address pointer and an address index unit.Three types of mouth instructions, three types of store instructions, and the operation NOP Can be specified.
  • the mouth instruction is a data transfer instruction from the second memory 802 to the data processing unit 101, 102,... 10 ⁇ register 60, while the store instruction is a data transfer instruction from the data processing unit register 60. This is an overnight transfer to the second memory 802.
  • the contents of the register used as the address pointer of the control unit 200 are used as the address of the memory. In case of @A, addless boyne is not updated. In the case of @ A +, the value of the address pointer is incremented by 1 after data transfer.
  • the address value is increased by the addressless index value.
  • 60, Galois buffer 50 can be selected. When the Galois buffer 50 is selected, buffer buses 21 to 23 are held or +1. Since the operation of the address pointer of the memory on the control unit 200 side and the operation of the buffer units 21 to 23 on the data processing unit side can be automatically synchronized, efficient data transfer becomes possible.
  • a feature of the embodiment of the present invention disclosed here is that the data transfer instruction and the SIMD instruction are executed in parallel and simultaneously, and the boys 21 to 23 of the Galois buffer 50 can be updated at the same time. Even if data transfer and calculation can be performed simultaneously, If it cannot be updated, high processing performance cannot be expected.
  • the SIMD instruction is roughly classified into a data transfer instruction during the register operation, an integer operation instruction for calculating the value of the register error, and a Galois field operation instruction that performs direct error correction processing.
  • the GICOPY instruction is an integer data transfer instruction that transfers data between the integer registers PO, P1,... P7 and pointers PS1, PS2, PD, and PEND.
  • the GC0PY instruction is a transfer instruction for the number of Galois, and transfers data between the Galois register 60 and the Galois buffer 50 overnight. When the Galois buffer 50 becomes an operand, the pointers 21 to 23 can be updated at the same time, which is effective when the buffer 50 is initialized.
  • the integer operation instruction performs addition, subtraction, increment, and decrement of the integer register in order to generate the values of the pointers 21 to 23.
  • the IZER0 flag is set when the addition result becomes zero
  • the INEG flag is set when the addition result becomes negative. If the integer register 70 stores the degree of a polynomial when performing error correction processing, a program can be created efficiently.However, when comparing the degree of two polynomials or giving the number of loop iterations, Use a flag for
  • the Galois field operation instruction is used to calculate the number on the Galois field used for error correction.
  • GADMS and GADMC that perform multiplication and multiply-accumulate simultaneously are provided.
  • D: Sy * D + Sz
  • Command GADMS is suitable for syndrome calculation
  • command GADMC is suitable for chain search.
  • a complex multiply-accumulate instruction (D: two Sw * Sx + Sy * Sz) is an instruction suitable for the quick algorithm.
  • the instruction GINV is used to find the inverse of a number on a Galois field by using it seven times. Division can be performed by multiplying by the reciprocal.
  • the GB (PS1 [+/-]) shown in the column of the contents of the Galois operation instructions GMULTSx, Sy, D, etc. is related to the Galois buffer (GB) 's Boyne (PS1). 1 or 1 can be selected, which means that the operation can be executed simultaneously by the instruction.
  • GB (PS2 [+/-]) has the same meaning for the pointer (PS2).
  • the operations of +1 and -1 are performed in the increment and decrement evenings respectively added to PS 1, PS 2 and PD. The choice is specified by the instruction.
  • FIG. 16 shows the structure of all the instruction codes of the general RISC instruction and the instruction combining the data transfer instruction and the SIMD instruction.
  • the general RISC instruction is assigned an n-bit instruction code corresponding to the moniker.
  • the data transfer instructions the one shown in FIG. 17 can be freely combined with the SIMD instruction shown in FIG. 18 to constitute a parallel execution instruction (composite instruction).
  • An m-bit code is assigned to the compound instruction, and the k-bit identification code for distinguishing from the general RISC instruction, the p-bit of the transfer instruction code shown in Fig. 17 and the 18th code It consists of q bits of the SIMD instruction code shown in the figure.
  • FIG. 22 shows an example of a program created by using the instruction set shown in FIGS. 13 to 15.
  • Fig. 22 shows the program This is a part of the Euclidean algorithm 1004 performed in the error correction processing of the lead-Solomon code shown in FIG.
  • the coefficients of the two old and new error numerical polynomials are stored in the Galois buffer 50. Since the order of the two old and new error numerical polynomials differs for each data processing unit 101, 102,... 10 ⁇ operating in parallel, the Galois buffer storing the coefficients of the two old and new error numerical polynomials is used. Is stored in integer register 70.
  • the highest and lowest order coefficients of the new error numerical polynomial are stored in the address of the Galois buffer 50 indicated by P0 and ⁇ 6, respectively.
  • the highest and lowest order coefficients of the old error numerical polynomial are stored in the addresses of the Galois buffer 50 indicated by ⁇ 2 and ⁇ 7, respectively.
  • the old error numerical polynomial is updated by performing a coefficient operation between the two old and new error numerical polynomials.
  • the order of the error numerical polynomial is the highest at the beginning of the transciprocal algorithm, and the order is gradually reduced by repeating the update. Finally, the error numerical polynomial 2004 of an appropriate order is obtained.
  • Figure 22 shows the update of old error polynomials in two repeat loops 3005 and 3009.
  • the values of the pointers (PS1) 21 and (PS2) 22 give the addresses where the coefficients of the new and old error numerical polynomials being calculated are stored, and the data processing units 101, 102,. .. 10n have different values for each You.
  • the values of the boys 21 and 22 decrease by 1 each time the arithmetic instructions 3006 and 3010 are executed.
  • PEND the corresponding data processing unit It goes into a standby state.
  • the error numerical polynomials to be processed are placed in a standby state sequentially from the one having a lower order, and when all the data processing units 101, 102,... 10 ⁇ are in a standby state, a repeat loop 3005, 3009 Is forcibly terminated.
  • the 16 times set as the number of repetitions in repeat loops 3005 and 3009 is the maximum theoretically possible number of repetitions, and even if this program is executed on the SIMD type parallel processor that does not have a repeat loop forcible termination mechanism, Works fine.
  • the system waits until the repetition of 2 times ends, and waits while holding the result.
  • the number of repetitions is about 12 times the average of the sum of the first and second repeat maps 3005 and 3009.
  • a forced termination mechanism it may be performed once or twice, but in an embodiment without a forced termination mechanism, 32 or more repetitions may be performed.
  • the SIMD type parallel processor having the repeat instruction forced termination mechanism shown in the fourth embodiment has the effect of greatly reducing the number of processing steps.
  • the repeat loop forced termination mechanism can be effectively used in other error correction processing routines and other ordinary digital signal processing. It often works.
  • FIG. 23 shows a system block diagram in which the SIMD parallel processor described above is applied to a DVD / CD-R0M device.
  • the peripheral circuit 900 is connected via a peripheral bus by a bus interface circuit 901.
  • the peripheral circuit 900 includes, for example, an analog interface circuit (analog I / F) 905, a D / A converter 904 for controlling the big-up 913, and a PWM (Pulse Width Modulation) converter for controlling the motors 911 and 912.
  • the adjustment circuit 903 and the D / A converter 902 for audio output.
  • the analog interface circuit 905 controls the big-up 913 via the analog signal processing circuit 909, fetches data, and fetches information necessary for control.
  • the information required for control is information on lens focus, envelope, focus, and tracking.
  • the control unit 200 of the processor Based on this information, the control unit 200 of the processor performs data processing, adjusts the focus and tracking of the make-up 913, and drives the thread motor 912 and the spindle motor 911.
  • the data read from the medium is taken into the memories 801 and 802 through the analog interface circuit 905, and subjected to error correction processing by the data processing units 101, 102,... 10 ⁇ . Is output.
  • the present invention does not use the control unit 200 only for controlling the SIMD parallel processor, but includes a general RISC instruction to perform servo control processing, tracking control processing, and the like by error correction and time division. Also used for Furthermore, in the preferred embodiment, a mechanism capable of executing general DSP instructions is added, and all system control tasks, signal processing tasks, and special data processing tasks such as error correction are described in a batch program. At this time, the control microcomputer, servo / tracking control LSI or DSP, which was a separate component in the past, can be omitted, and the SIMD parallel processor according to an example of the present invention can perform batch processing, resulting in a large equipment cost. It is reduced to Furthermore, since all tasks can be developed collectively, it is very easy to match tasks and the development period can be significantly reduced.
  • the program for error correction processing and the program for bit-up, mode, and audio output can be developed collectively, the cost for device development is greatly reduced.
  • a microcomputer for controlling the entire device is required as a device, but according to the present embodiment, a processor for performing error correction also controls the entire device, so that the device cost itself can be significantly reduced.
  • the above embodiment is an example in which the present invention is applied to a DVD / CD-ROM device.However, in order to cope with broadcasting media, pickups, modems, and the like are replaced with demodulation circuits, communication protocol control circuits, and the like. It is easily realized by: As described above, the introduction of a SIMD parallel processor to perform code error correction processing enables the data processing speed to be improved without any change in the basic architecture or program without increasing the processing speed. Only by increasing the number of overnight processing units to increase the degree of parallelism, it is possible to easily cope. Codes of different standards can be dealt with by changing the program, and systems that assume error correction of multiple standards can be easily handled.
  • branch processing can be realized by placing multiple data processing units in a standby state based on the results of individual calculations.
  • the present invention is not limited thereto, and it is needless to say that various modifications can be made without departing from the gist of the invention.
  • the operation program of the data processing device can be stored in a built-in ROM or provided by an external ROM or the like.
  • the number of data processing units and the bus configuration for connecting the data processing unit and the control unit are not limited to the above-described embodiment, and can be changed as appropriate.
  • the method of notifying the control unit of the standby state of each data processing unit is not limited to a configuration in which the control unit is notified by a signal for each data processing unit.
  • the control unit may refer to the standby state by accessing the standby register via the common bus to determine whether the standby state is established.
  • the present invention relates to a data processing device such as a SIMD parallel data processor for improving parallel processing performance or parallel processing efficiency, and a data code for correcting a code error in a storage system or a communication system.
  • Data processing system that performs encryption and decryption, for example, information reproduction or information recording systems on recording media such as CD-ROM, DVD, M0 (Magneto-Optics), and satellite broadcast receiving systems. Can be applied.

Abstract

To improve the parallel processing throughput of a data processor, such as a SIMD parallel processor, data processor has a controller (200) for decoding a fetched instruction and executing the same, and a plurality of data processing units (101) adapted to receive in parallel control information for computing operations from the controller and to transfer processed data by the controller. Each data processing unit has a judgement element (111) for judging whether the results of computation in a computing element are in agreement with standby conditions, a standby register (121) adapted to be put in a set condition in accordance with the judgement results, and a circuit (131) adapted to stop the operation of the data processing unit in response to the set condition of the standby register as a standby control means for setting the data processing unit in a standby condition in accordance with the results of a computing operation based on the control information mentioned above. The controlling operation to return each data processing unit from a standby condition to an active condition is done by the controller. The plurally provided data processing units are put in a standby condition on the basis of the respective results of computations, and the restoration of the data processing units from a standby condition to an active condition is done by the controller to attain a branching process. Therefore, when a branching process is carried out in such manner that a standby condition decreases, a useless cycle due to the standby condition of the data processing units can be minimized easily, and the parallel processing throughput of a SIMD processor can be improved.

Description

明 細 書 データ処理装置及びデ一夕処理システム 技術分野  Description Data processing equipment and data processing system
本発明は、 並列演算処理が可能な S I M D ( Single Instruction Multiple Data)型のデータ処理装置に関し、 例えば符号の誤り訂正、 更 には蓄積系や通信系におけるデ一夕の符号化及び復号に適用して有効 な技術に関する。 背景技術  The present invention relates to a SIMD (Single Instruction Multiple Data) type data processing device capable of performing parallel arithmetic processing, and is applied to, for example, error correction of codes, and further to encoding and decoding of data in storage systems and communication systems. Effective technology. Background art
ハードディスク、 C D— R O M (Compact Disc-Read Only Memory), D V D (Digital Video Disc ), 光磁気ディスクなどの記録媒体を扱うデ —夕記録装置には、媒体で生じた記録/読み出しの誤りを訂正できる符 号語が使われている。符号語は、例えばガロア体と呼ばれる特殊な数の 集合と、 それと共に定義された特殊な演算によって定義されている。符 号の誤り訂正は、ガロア体の数と演算を用いたデータ処理によって行わ れる。 ガロア体の数の集合は原始多項式と呼ばれる、 数の集合を定義す る基礎となる多項式によって、 複数種類の定義が可能であって、 同時に 演算も原始多項式によって異なる定義がなされる。最も多用されている 符号語には、 リードソロモン(Reed Solomon)符号があって、 特に誤りが 一部に集中しやすいデータの蓄積系や通信系の誤り訂正符号に用いら れている。 リードソロモン符号は、 ガロア体の数を用いて定義されてお り、 ガロア体上の演算によって符号化、 復号の処理が行われる。 ガロア 体を定義する原始多項式は、 媒体毎に規格が決められている。 尚、 ガロ ァ体、 ガロア体の数の四則演算などについては、 公知であるから、 ここ では詳細な説明を省略する。 Hard disk drives, CD-ROMs (Compact Disc-Read Only Memory), DVDs (Digital Video Discs), magneto-optical disks, and other recording media can be used to correct recording / reading errors that occur on media. Code words are used. A codeword is defined, for example, by a special set of numbers called the Galois field and special operations defined with it. Code error correction is performed by data processing using the number of Galois fields and arithmetic. A set of numbers in the Galois field can be defined in multiple ways by a primitive polynomial called a primitive polynomial, which defines the set of numbers. At the same time, the operations are defined differently depending on the primitive polynomial. The most frequently used codeword is the Reed Solomon code, which is used particularly for error correction codes in data storage systems and communication systems where errors tend to concentrate locally. Reed-Solomon codes are defined using the number of Galois fields, and encoding and decoding are performed by operations on the Galois field. The primitive polynomial that defines the Galois field is defined for each medium. Since the Galois field and the four arithmetic operations of the number of the Galois field are well known, A detailed description will be omitted.
蓄積系 (若しくは記録系) の媒体や通信系の伝送路からは、 符号語が 連続的に受信されて処理されるので、誤り訂正を含む受信データの処理 のために許される時間は、 データの受信速度から決められている。 これ は処理のリアルタイム性を考慮するためである。符号の誤り訂正処理に は、 ガロア体の数に対する四則演算の特殊性から、 また、 リアルタイム 性を考慮して、 汎用のプロセッサではなく、 ハードワイア一ドのカス夕 ム L S I (半導体集積回路) が用いられてきた。  Since codewords are continuously received and processed from a storage (or recording) medium or a communication transmission path, the time allowed for processing of received data including error correction is limited to the data transmission time. It is determined from the reception speed. This is to take into account the real-time processing. The error correction of the code requires a hardwired custom LSI (semiconductor integrated circuit) instead of a general-purpose processor in consideration of the specialty of the four arithmetic operations on the number of Galois fields and real-time performance. Has been used.
ガロア体上の演算器をハ一ドウエアにより構成し、プロセッサ形式で 誤り訂正を行う L S I としては、 「光ディスク用誤り訂正 L S I」 (電 子情報通信学会論文誌 A Vol . J73-A NO.2 pp.261-268, 1990 年 2月) に記載されたものが知られている (第 1の公知例) 。 これによれば、 誤 り訂正の処理手順は、 以下の 4ステップに分割される。ステップ 1では、 受信された符号語からシンドローム多項式を計算する。ステップ 2では、 シンドローム多項式から誤り位置多項式と誤り数値多項式を求める。ス テツプ 3では、 誤り位置多項式から誤り位置を求める。ステップ 4では、 誤り位置多項式と誤り数値多項式から誤り数値を求める。上記公知例は、 上記全てのステツプをプロセッサ形式のハードウエアで行うことは、 リ アルタイム性の制約から困難であるため、ステップ 2とステップ 4のみ をプロセッサで行い、ステップ 1とステップ 3をハードワイア一ドの専 用回路で処理している。  The arithmetic unit on the Galois field is composed of hardware, and as an LSI that performs error correction in the form of a processor, “Error Correction LSI for Optical Disks” (Transactions of the Institute of Electronics, Information and Communication Engineers A Vol. J73-A NO.2 pp .261-268, February 1990) is known (first known example). According to this, the error correction processing procedure is divided into the following four steps. In step 1, a syndrome polynomial is calculated from the received codeword. In step 2, the error locator polynomial and the error value polynomial are determined from the syndrome polynomial. In step 3, the error location is determined from the error location polynomial. In step 4, an error value is obtained from the error locator polynomial and the error value polynomial. In the above-mentioned known example, it is difficult to perform all of the above steps using hardware in the form of a processor due to restrictions on real-time properties. Processing is performed by the dedicated circuit of the wire.
リアルタイム性を満足するためにデータ処理性能を向上させる方策 としては、 プロセッサの動作周波数を高くする方法と、 並列処理を導入 して演算性能を向上する方法が挙げられる。前者の動作周波数の向上は、 プロセス/回路技術を向上させたり、多段のパイプライン処理を採用す るなどの手法によって可能である。 しかしながら、 プロセス/回路技術 の向上は一度に数倍の性能向上を達成できるものではない。 また、 プロ 七ッサ方式のハ一ドウエアに多段のパイプライン処理を導入すると、分 岐処理で多くのオーバーへッ ドが生ずるなどの問題がある。 Measures to improve the data processing performance to satisfy the real-time property include a method of increasing the operating frequency of the processor and a method of improving the operation performance by introducing parallel processing. The former operation frequency can be improved by improving the process / circuit technology or adopting multi-stage pipeline processing. However, process / circuit technology The improvement in performance does not mean that a performance improvement several times at a time can be achieved. In addition, if multi-stage pipeline processing is introduced into hardware of the processor system, there is a problem that a large amount of overhead is generated in branch processing.
一方、 前記並列処理を導入するのは、 比較的容易に演算性能の向上が 図れるが、処理アルゴリズム自体が並列処理に適合していないと実質的 に処理性能が向上できないという問題がある。処理アルゴリズムの並列 性を乱す要因として、 演算結果に依存した分岐処理がある。  On the other hand, the introduction of the parallel processing can relatively easily improve the calculation performance, but has a problem that the processing performance cannot be substantially improved unless the processing algorithm itself is adapted to the parallel processing. A factor that disrupts the parallelism of the processing algorithm is branch processing that depends on the operation result.
並列処理を行う上で、演算結果に依存した分岐処理を可能にした技術 として、 「並列処理における条件付き演算制御回路 (特閧平 5- 189585 号公報) 」 が知られている (第 2の公知例) 。 これは、 1個の命令供給 回路と、同一命令を異なったデータに対して実行する同一構成の複数の 演算ュニッ トから成る、 S I M D型並列プロセッサにおいて、 各演算ュ ニッ トにフラグ制御回路を設ける。 フラグ制御回路は、 対応する演算ュ ニッ 卜からその演算結果を表す演算結果フラグを入力して演算条件フ ラグを前記対応する演算ュニッ トに出力する。演算条件フラグに応じて、 演算ュニッ トは出力レジス夕の更新/保持など、条件付き演算を行う。 その公知例で開示された技術によれば、演算結果フラグはフラグ制御回 路内に設けられたシフ トレジス夕に複数サイクル蓄えられていて、その 期間演算結果を演算条件フラグに反映させることができる。この技術を 用いれば、 S I M D型並列プロセッサにおいて、 各演算ユニッ トは、 そ の演算結果に依存した分岐処理を実現できる。  As a technique that enables branch processing depending on the operation result in performing parallel processing, a “conditional operation control circuit in parallel processing (Japanese Patent Application Laid-Open No. 5-189585)” is known. Known examples). This is a SIMD-type parallel processor consisting of one instruction supply circuit and a plurality of operation units of the same configuration that execute the same instruction on different data.In each SIMD type parallel processor, a flag control circuit is provided for each operation unit. . The flag control circuit inputs an operation result flag indicating the operation result from the corresponding operation unit and outputs an operation condition flag to the corresponding operation unit. According to the operation condition flag, the operation unit performs conditional operation such as updating / holding of the output register. According to the technique disclosed in the known example, the operation result flag is stored in the shift register provided in the flag control circuit for a plurality of cycles, and the operation result can be reflected in the operation condition flag during that period. . If this technology is used, in the SIMD type parallel processor, each arithmetic unit can realize branch processing depending on the arithmetic result.
近年、 記録媒体の高密度化、 アクセス速度の高速化が顕著になり、 力 ス夕ム L S Iの開発にかかるコス トが増大するという課題が顕著にな つてきた。 また、 通信系はもとより蓄積系でも複数の媒体それそれの規 格に対応した誤り訂正が求められる場合、原始多項式の相違などを考慮 して、想定される全ての誤り訂正を実行できるカスタム L S Iが必要に なり、 装置コス トの増大を招くという課題がある。 In recent years, the density of recording media and the speed of access have become remarkable, and the problem of increasing the cost of developing power LSIs has become prominent. Also, if error correction corresponding to multiple media and their respective standards is required not only in communication systems but also in storage systems, custom LSIs that can execute all possible error corrections taking into account differences in primitive polynomials etc. Necessary Therefore, there is a problem that the cost of equipment is increased.
CD-ROM ではアクセス速度が標準速から 2倍速、 4倍速、 更には 1 2 倍速以上にまで高速化した装置が提供されており、特定の記録媒体での アクセス速度の高速化に対応するには、誤り訂正を行う L S Iの処理速 度の向上が不可欠である。高速化の要求は、 半導体の微細加工による性 能向上では対応できないほど性急であるため、アーキテクチャレベルで の再設計が必要となる。 これには膨大な開発工数を要し、 開発コス 卜の 増大という課題につながつている。  CD-ROMs are provided with devices whose access speed has been increased from the standard speed to 2x speed, 4x speed, and even 12x speed or more. It is essential to improve the processing speed of the LSI that performs error correction. The demand for higher speed is so urgent that it cannot be met by improving the performance through microfabrication of semiconductors, so an architectural level redesign is required. This requires a huge amount of development man-hours, leading to the problem of increasing development costs.
コンピュータの新しい外部記録媒体として、光ディスクと DVD- ROMを 考える場合、どちらも誤り訂正符号としてリ一ドソロモン符号を使って いるが、 ガロア体を定義する原始多項式が異なる。原始多項式が異なる ということは、 乗算の定義自体が異なるので、 専用ハードワイア一ドロ ジックで構成された乗算回路の構成が異なる。光ディスクと DVD-ROMの 双方の誤り訂正に適用可能な L S Iを提供しょうとすれば、それそれの 規格に準拠した専用ハ一ドワイア一ドロジックの演算回路が必要とな り、 コス トの増大を招くという課題が生まれる。  When considering an optical disc and a DVD-ROM as new external recording media for a computer, both use a lead-Solomon code as an error correction code, but the primitive polynomial that defines the Galois field is different. The difference in primitive polynomials means that the definition of multiplication is different, so the configuration of the multiplication circuit composed of dedicated hardwired logic is different. Providing LSIs that can be used for error correction for both optical disks and DVD-ROMs requires dedicated hard-wired logic arithmetic circuits that comply with the respective standards, resulting in increased costs. Is born.
前記第 1の公知例は、光ディスクの誤り訂正の一部の処理をプロセッ サ形式のハードウエアで行っているが、残りの処理が専用ハードウエア で処理されているため、 汎用性は失われている。 すなわち、 ガロア体の 数と演算を定義する原始多項式の異なる符号語の誤り訂正には対応で きない。 また、 データの受信速度が想定したよりも高くなると、 L S I の大部分を再設計しなければならなくなる。プロセッサで対応していた ステップ 2、 4の処理の一部をハ一ドウエア処理に変更せざるを得なく なるためである。 このため、 開発コス 卜の増大を招く点で、 ハードワイ ァ一ドのカスタム L S Iと大きな違いがない。  In the first known example, a part of the error correction processing of the optical disc is performed by the processor type hardware, but the remaining processing is performed by the dedicated hardware, so the versatility is lost. I have. In other words, it cannot deal with error correction of codewords with different primitive polynomials that define the number of Galois fields and operations. Also, if the data reception speed is higher than expected, most of the LSI must be redesigned. This is because some of the processes in steps 2 and 4 that were supported by the processor must be changed to hardware processing. For this reason, there is no significant difference from the hard-wired custom LSI in that the development cost is increased.
性能向上のために S I M D型並列プロセッサを導入し、前記第 2の公 知例に開示された技術を用いても、条件分岐を効率良く処理できないと いう、 別の課題が発生する。 一例とし、 並列処理を行っているある 2個 の演算ュニッ 卜において、 実行しているプログラムに、 演算結果に依存 した条件分岐がある場合を考える。第 2 4図のフローチヤ一卜で示すよ うに、 処理 Aを実行した結果、 所定の条件を満足した場合は処理 Bを実 行した後に処理 Cを実行し、満足しなかつた場合は処理 Bを実行しない で処理 Cを実行するように指定されていると仮定する。実際に、 誤り訂 正のプログラムには、演算結果に依存した条件分岐が多数必要とされて いる。前記第 2の公知例を用いて、 第 2 4図に示されるプログラムを実 行する場合、 処理 Bの全ての命令を条件付き演算命令で記述する。今、 一方の演算ュニッ トの演算結果が分岐の条件を満足し、他方が満足しな かった場合、 第 2 5図の(a ) に示すように、前者の演算ュニヅ ト(PE0 ) で処理 Bが実行されている間、 後者の演算ュニッ ト(PE1 )は出力レジス 夕を保持している。処理 Bが完了した後、 2個の演算ュニッ 卜が並列し て処理 Cを開始する。 第 2 5図の (b ) に示すように、 2個の演算ュニ ッ ト(PE0, PE1 )で共に条件を満足しなかった場合、 どちらの演算ュニッ トも処理 Bを実行しないでよいが、命令が条件付き演算命令で記述され ているので、処理 Bの実行サイクル数だけ何も演算が行われないサイク ルが繰り返される。これは演算ュニッ トの数が 2個以上の場合でも同様 で、 全ての演算ュニッ トで条件を満足しない場合、処理 Bのサイクル数 分が無駄に消費される。以上のように、 第 2の公知例で開示された並列 処理技術を用いても、処理性能を十分に向上させることができないとい う課題がある。 Introduced a SIMD parallel processor to improve performance, Another problem arises that the technique disclosed in the known example cannot be used to efficiently process conditional branches. As an example, let us consider a case where, in two operation units performing parallel processing, a program being executed has a conditional branch depending on an operation result. As shown in the flowchart of FIG. 24, as a result of the execution of the processing A, if the predetermined conditions are satisfied, the processing B is executed after the execution of the processing B, and if not, the processing B is executed. Suppose that it is specified to execute process C without executing. In fact, an error correction program requires many conditional branches depending on the operation result. When the program shown in FIG. 24 is executed using the second known example, all the instructions of the process B are described by conditional operation instructions. Now, if the operation result of one operation unit satisfies the branch condition and the other does not, the processing unit (PE0) processes the former as shown in FIG. 25 (a). During execution of B, the latter operation unit (PE1) holds the output register. After the processing B is completed, the two arithmetic units start the processing C in parallel. As shown in (b) of Fig. 25, if the condition is not satisfied in both operation units (PE0, PE1), neither operation unit may execute process B. Since the instruction is described by a conditional operation instruction, cycles in which no operation is performed are repeated by the number of execution cycles of the processing B. The same applies to the case where the number of operation units is two or more. If the conditions are not satisfied in all the operation units, the number of cycles of the processing B is wasted. As described above, there is a problem that even if the parallel processing technology disclosed in the second known example is used, the processing performance cannot be sufficiently improved.
本発明の目的は、 S I M D形式による並列演算処理性能を向上させる ことができるデータ処理装置を提供することにある。  An object of the present invention is to provide a data processing device capable of improving parallel processing performance in the SIMD format.
本発明の別の目的は、符号語に対する誤り訂正処理速度を向上させる ことができるデータ処理装置を提供することにある。 Another object of the present invention is to improve the error correction processing speed for a codeword. To provide a data processing device capable of performing such operations.
本発明の更に別の目的は、 広範囲の誤り訂正符号に対応でき、 且つ、 性能向上に対して簡単な設計変更で対応できるデ一夕処理装置を提供 することにある。  Still another object of the present invention is to provide a data processing apparatus which can cope with a wide range of error correction codes and can cope with an improvement in performance by a simple design change.
本発明のその他の目的は、誤り訂正処理の観点から蓄積系におけるデ 一夕読み出し速度の高速化や通信系における高速データ伝送に対応で きるデータ処理システムを提供することにある。  Another object of the present invention is to provide a data processing system capable of accelerating the data reading speed in a storage system and supporting high-speed data transmission in a communication system from the viewpoint of error correction processing.
本発明の前記ならびにその他の目的と新規な特徴は本明細書の以下 の記述から明らかにされるであろう。 発明の開示  The above and other objects and novel features of the present invention will become apparent from the following description of the present specification. Disclosure of the invention
本発明に係る S I M D形式のデータ処理装置は、フェツチした命令を 解読して実行する制御部と、前記制御部から演算動作のための制御情報 が並列的に与えられると共に、前記制御部によってデータ転送制御され る複数個のデータ処理部とを有する。前記夫々のデータ処理部は、 前記 制御情報に従った演算動作の結果に応じてデ一夕処理部を待機状態に する待機制御手段を有し、夫々のデ一夕処理部を待機状態から活性状態 に復帰させる制御は、 前記制御部が行う。  A SIMD data processing device according to the present invention is configured such that a control unit that decodes and executes a fetched instruction and control information for an arithmetic operation are provided in parallel from the control unit, and data transfer is performed by the control unit. And a plurality of data processing units to be controlled. Each of the data processing units includes a standby control unit that sets the data processing unit to a standby state in accordance with a result of an arithmetic operation according to the control information, and activates each data processing unit from the standby state. The control to return to the state is performed by the control unit.
このように、 複数個設けられたデ一夕処理部は、個々の演算結果に基 づいて待機状態となり、待機状態から活性状態への復帰を制御部が行つ て、 分岐処理を実現することができる。 したがって、 待機状態が少なく なるように、 制御部での分岐処理を実現すれば、 データ処理部の待機状 態による無駄なサイクルを最小限に抑えることが容易であり、 S I M D 形式による並列演算処理性能を向上させることができる。  As described above, the plurality of data processing units are brought into the standby state based on the results of the individual calculations, and the control unit returns from the standby state to the active state, thereby realizing the branch processing. Can be. Therefore, if branch processing is implemented in the control unit so that the standby state is reduced, it is easy to minimize unnecessary cycles due to the standby state of the data processing unit, and the parallel processing performance in the SIMD format is reduced. Can be improved.
前記制御部は、 夫々のデ一夕処理部が待機状態であるか否かを検出 する検出手段と、前記検出手段による検出結果に応じてデ一夕処理部を 待機状態から活性状態に復帰させる論理手段とを含む。例えば、 全ての データ処理部が待機状態になって無駄なサイクルを生じないように、制 御部は、 各々のデータ処理部の動作状態を監視し、 制御する。 すなわち、 前記論理手段は、 全てのデータ処理部が待機状態にあるとき、 前記 制御部による命令実行順序を変更すると共に、待機状態にあるデ一夕処 理部を待機状態から活性状態に復帰させる。 The control unit includes: a detection unit configured to detect whether each data processing unit is in a standby state; and a data processing unit according to a detection result by the detection unit. Logic means for returning from the standby state to the active state. For example, the control unit monitors and controls the operation state of each data processing unit so that all the data processing units do not enter the standby state and generate a useless cycle. That is, when all the data processing units are in the standby state, the logic unit changes the order of instruction execution by the control unit and returns the data processing unit in the standby state from the standby state to the active state. .
前記待機制御手段は、 データ処理部による演算動作の結果が特定の 状態になったか否かを判定する判定手段と、前記判定手段による前記特 定状態の検出に同期してセッ ト状態にされ、前記制御部からの特定の制 御情報によってリセッ ト状態にされる待機レジス夕と、前記待機レジス 夕のセッ ト状態に応答してデータ処理部による演算動作を停止させる 手段とを含んで構成することができる。  The standby control unit includes a determination unit that determines whether a result of an arithmetic operation performed by the data processing unit has reached a specific state, and is set to a set state in synchronization with the detection of the specific state by the determination unit. A standby register that is reset by specific control information from the control unit; and a unit that stops an arithmetic operation by the data processing unit in response to the set state of the standby register. be able to.
前記演算動作を停止させる手段は、前記制御部から供給される制御情 報が内部回路に伝達されるのを選択的に抑止するように構成すること ができる。前記演算動作を停止させる手段を更に簡単に構成するには、 ク口ック信号に同期して演算動作を行う回路部分へのクロック信号の 供給を選択的に停止させる回路を採用することができる。  The means for stopping the arithmetic operation may be configured to selectively inhibit transmission of control information supplied from the control unit to an internal circuit. In order to further simplify the means for stopping the arithmetic operation, it is possible to employ a circuit for selectively stopping the supply of the clock signal to the circuit portion performing the arithmetic operation in synchronization with the clock signal. .
前記データ処理部には、 ガロア体の乗算回路と加算回路を含め、 前記 制御部には、前記ガロア体の乗算回路と加算回路を制御するための演算 命令として、 ガロア体乗算命令、 ガロア体加算命令、 及びガロア体積和 演算命令を少なくとも実行させ、データ処理装置を半導体集積回路化し て構成することができる。  The data processing unit includes a Galois field multiplication circuit and an addition circuit. The control unit includes a Galois field multiplication instruction and a Galois field addition as operation instructions for controlling the Galois field multiplication circuit and the addition circuit. The data processing device can be configured as a semiconductor integrated circuit by executing at least the instruction and the Galois volume sum operation instruction.
ガロア体上で定義された符号の誤り訂正を行うプログラムを格納し たプログラムメモリを更に有する上記デ一夕処理装置において、前記制 御部は前記プログラムメモリから命令をフェッチし、前記データ処理部 を用いて誤り訂正処理を行うことができる。 符号の誤り訂正処理を行うのに S I M D型並列プロセッサを導入し たことによって、 メディァの要求する処理速度の向上に対して、 基本的 なアーキテクチャやプログラムの変更なしに、データ処理部の数を増加 して並列度を高めるだけで、 容易に対応ができる。異なった規格の符号 に対しては、 プログラムの変更で対処可能で、 複数の規格の誤り訂正を 想定するようなシステムにも容易に対応できる。 In the above data processing apparatus further comprising a program memory storing a program for performing error correction of a code defined on a Galois field, the control unit fetches an instruction from the program memory, and executes the data processing unit. Error correction processing. The introduction of a SIMD parallel processor to perform code error correction processing increases the number of data processing units without changing the basic architecture or programs to improve the processing speed required by media. By simply increasing the degree of parallelism, you can easily respond. Codes of different standards can be dealt with by changing the program, and systems that assume error correction of multiple standards can be easily handled.
本発明の別の観点による S I M D形式のデータ処理装置は、フェツチ した命令を解読して実行する制御部と、前記制御部から演算動作のため の制御情報が並列的に与えられると共に、前記制御部によってデータ転 送制御される複数個のデ一夕処理部と、前記制御部によってアクセスさ れる記憶手段とを有し、 前記夫々のデ一夕処理部は、第 1の演算回路と、 前記第 1の演算回路に接続されたバッファ手段と、前記バッファ手段の ァドレスを変更可能に指定する複数個のボイン夕手段とを含み、 前記 夫々のデータ処理部のバッファ手段はデ一夕バスを介して前記記憶手 段に接続されて成る。  According to another aspect of the present invention, there is provided a SIMD data processing device that decodes and executes a fetched instruction, and that the control unit receives control information for an arithmetic operation in parallel from the control unit, A plurality of data processing units controlled to transfer data by the control unit, and storage means accessed by the control unit, wherein each of the data processing units includes a first arithmetic circuit, Buffer means connected to one arithmetic circuit, and a plurality of bus means for designating the address of the buffer means so as to be changeable, wherein the buffer means of each data processing unit is connected via a data bus. It is connected to the storage means.
デ一夕処理部毎に設けられたバッファ手段は、演算途中データなどを 一時的に格納するワーク領域などとして用いられる。 したがって、 並列 動作される各バッファ手段は、前記記憶手段をワーク領域に兼用する場 合に比べて、データ転送に起因するオーバ一へッ ドを抑えることができ、 並列演算処理性能を向上させることができる。  The buffer means provided for each data processing unit is used as a work area for temporarily storing data during the operation and the like. Therefore, each buffer unit operated in parallel can suppress overhead caused by data transfer as compared with the case where the storage unit is also used as a work area, and improve parallel processing performance. Can be.
また、前記バッファ手段に対するアドレシングを複数個のボイン夕手 段を介して行なうから、 前記制御部において、 バッファ手段をアクセス 制御するための命令記述に含まれる、オペランド指定フィールドのビッ ト数を減らすことができる。例えば、 A + Bを Cに代入する加算演算を 1個の命令で指定するとき、 A , Bのソースアドレスと Cのデイスティ ネ一ションァドレスとをオペランド指定フィールドに記述しなければ ならない。 このとき、 オペランド指定フィ一ルドに、 前記ボイン夕手段 の識別情報を記述して利用すれば、命令にバッファ手段のァドレスを直 接記述するのに比べ、オペランド指定フィ一ルドのビッ ト数を少なくす ることができる。 これにより、 高機能な演算処理を 1命令で定義すると き、 命令語長の縮小に寄与することができる。 Also, since the addressing to the buffer means is performed via a plurality of bus means, the control unit should reduce the number of bits of the operand designation field included in the instruction description for controlling access to the buffer means. Can be. For example, when specifying an addition operation that substitutes A + B for C in one instruction, the source address of A and B and the destination address of C must be described in the operand specification field. No. At this time, if the identification information of the above-mentioned buffer means is described and used in the operand specification field, the number of bits of the operand specification field can be reduced as compared with the case where the address of the buffer means is directly described in the instruction. It can be reduced. This can contribute to a reduction in instruction word length when a high-performance operation is defined by one instruction.
前記ボイン夕手段に対するアドレス情報の設定は、制御部が口一ド命 令などを実行して行なえばよい。 更に、 夫々のデータ処理部は、 前記制 御部によりボイン夕手段に設定されたァドレス情報の更新に用いられ る第 2の演算手段を更に含むことができる。 これにより、 制御部からデ —夕処理部へのデータ転送回数を減らすことができる。  The setting of the address information for the above-mentioned boyfriend means may be performed by the control unit executing a click instruction or the like. Further, each of the data processing units may further include a second calculation unit used for updating address information set in the binding unit by the control unit. As a result, the number of data transfers from the control unit to the data processing unit can be reduced.
前記制御部は、 前記データ処理部での並列的な演算を規定する演 算命令と、前記データ処理部に対するデ一夕転送を規定するデ一夕転送 命令とを実行する命令実行手段を含むことができる。前記命令実行手段 は、前記演算命令とデータ転送命令とを並列的に実行することができる。 すなわち、 データ処理装置は、 演算命令と、 データ転送命令とを組み合 わせた複合命令をサポートする。 S I M D形式の並列演算処理では、 演 算能力に比べてデ一夕転送能力が不足する場合が想定されるので、これ に対処することができる。  The control unit includes an instruction execution unit that executes an operation instruction that specifies a parallel operation in the data processing unit and a data transfer instruction that specifies a data transfer to the data processing unit. Can be. The instruction execution means can execute the operation instruction and the data transfer instruction in parallel. That is, the data processing device supports a compound instruction in which an operation instruction and a data transfer instruction are combined. In the parallel processing of the SIMD format, it is supposed that the data transfer capability may be insufficient compared to the computing capability, so that it is possible to deal with this.
同様に、 S I M D形式のデータ処理装置におけるデ一夕転送能力を補 うために、 前記命令実行手段には、 前記演算命令に含まれる単一の命令 であって、前記ポインタ手段で指定されバッファ手段から取得したデー 夕を演算し、演算結果を前記ボイン夕手段とは別のボイン夕手段で指定 されたバッファ手段に格納すると共に、前記ボイン夕の内容を更新する 操作を指示する命令を、演算命令の一つとして実行させることができる 前記命令実行手段は更に、前記制御部内部でデータを操作する命令、 及び前記制御部がフエツチする命令を分岐させる分岐命令を更に実行 可能にすることができる。 Similarly, in order to supplement the data transfer capability in the SIMD data processing device, the instruction execution means includes a single instruction included in the operation instruction, and a buffer instruction designated by the pointer means. The data obtained from the above is calculated, the calculation result is stored in the buffer means specified by the different boy means, and an instruction for updating the contents of the above boy data is given by the calculation. The instruction execution means further executes an instruction for operating data inside the control unit, and a branch instruction for branching an instruction fetched by the control unit. Can be made possible.
前記バッファ手段を特徴とする S I M D形式のデータ処理装置に対 しても、 先に説明した待機制御の各種手段を採用することができる。 本発明の更に別の観点による S I M D形式のデータ処理装置は、前述 の待機状態制御のために前記制御部が実行する命令に着目するもので ある。 すなわち、 デ一夕処理装置は、 前述の制御部と、 複数個のデータ 処理部とを含み、前記夫々のデータ処理部は、前記待機制御手段を有し、 前記制御部は、データ処理部が待機状態であるか否かを参照する手段を 有し、その参照結果にしたがってデータ処理部を待機状態から活性状態 に復帰させる。 このとき、 前記制御部は、 前記データ処理部を待機状態 にする条件をデータ処理部に設定すると共に、設定した時に当該設定さ れた条件が成立するデータ処理部を待機状態にさせる命令を実行する ことができる。 また、 前記制御部は、 前記データ処理部を待機状態にす る条件をデータ処理部に設定すると共に、前記条件設定の後の命令実行 サイクルにおいて当該設定された条件が成立するデータ処理部を待機 状態にさせる命令を実行することができる。 更に、 前記制御部は、 前記 複数のデータ処理部を個々に待機状態にし又は待機状態から活性状態 に復帰させる指示を与えるを命令を実行することができる。  The various means of the standby control described above can also be employed in a SIMD data processing device characterized by the buffer means. A SIMD data processing device according to still another aspect of the present invention focuses on an instruction executed by the control unit for the above-described standby state control. That is, the data processing device includes the control unit described above and a plurality of data processing units, each of the data processing units includes the standby control unit, and the control unit includes a data processing unit. There is means for referring to whether or not the data processing unit is in the standby state, and the data processing unit is returned from the standby state to the active state according to the reference result. At this time, the control unit sets a condition for setting the data processing unit in a standby state in the data processing unit, and executes an instruction for setting the data processing unit in which the set condition is satisfied to a standby state when set. can do. The control unit sets a condition for setting the data processing unit in a standby state in the data processing unit, and waits for the data processing unit in which the set condition is satisfied in an instruction execution cycle after the condition setting. Instructions can be executed to bring the state. Further, the control unit may execute a command for giving an instruction to individually set the plurality of data processing units to a standby state or to return to the active state from the standby state.
本発明の他の観点による S I M D形式のデータ処理装置は、並列演算 処理による繰り返し処理の効率化を企図したものである。 すなわち、 デ 一夕処理装置は、 前述の制御部と、 複数個のデータ処理部と、 前記制御 部によってアクセスされる記憶手段とを有し、前記夫々のデ一夕処理部 は、前記制御情報に従った演算動作の結果に応じてデータ処理部を待機 状態にする待機制御手段を有し、 当該待機制御手段は、 前記制御部から の指示に従って前記データ処理部の内部を待機状態から活性状態に復 帰させる。 このとき、 前記制御部は、 夫々のデータ処理部が待機状態で あるか否かを検出する検出手段と、前記検出手段による検出結果に応じ てデータ処理部を待機状態から活性状態に復帰させる論理手段とを含 み、 更に、 繰り返しループの閧始ァドレス、 繰り返しループの終了ァド レス及び繰り返しループの繰り返し回数を指定するリピート命令を実 行したとき前記開始ァドレスから終了ァドレスの命令に従って前記デ —夕処理部を、最大限前記設定された繰り返し回数だけ並列演算動作さ せる。 A SIMD data processing device according to another aspect of the present invention is intended to increase the efficiency of repetitive processing by parallel arithmetic processing. That is, the data processing apparatus has the above-described control unit, a plurality of data processing units, and storage means accessed by the control unit, and each of the data processing units stores the control information And a standby control unit that sets the data processing unit to a standby state according to a result of an arithmetic operation according to the following. The standby control unit changes the inside of the data processing unit from a standby state to an active state according to an instruction from the control unit. To return to At this time, the control unit operates when the respective data processing units are in a standby state. Detecting means for detecting whether or not the data processing unit is present, and logic means for returning the data processing unit from a standby state to an active state in accordance with a detection result by the detecting means. When a repeat instruction that specifies the end address and the number of repetitions of the repetition loop is executed, the data processing unit is operated in parallel according to the instructions from the start address to the end address and up to the set number of repetitions. Let me do it.
前記リビート命令は、 繰り返しループの開始ァドレス、繰り返しルー プの終了アドレス、繰り返しループの繰り返し回数及び繰り返しループ を強制終了する条件を指定する命令とすることができる。制御部は、 当 該リピート命令を実行したとき、前記強制終了条件が成立しない限り前 記開始ァドレスから終了ァドレスの命令に従って前記データ処理部を 繰り返し並列演算動作させる。 このリピート命令において、 全てのデー 夕処理部の待機状態を前記繰り返しループの強制終了条件として設定 することができる。 また、 そのリピ一ト命令において、 少なくとも 1個 のデータ処理部の待機状態を前記繰り返しループの強制終了条件とし て設定することができる。  The rebeat command may be a command that specifies the start address of the repeat loop, the end address of the repeat loop, the number of times the repeat loop is repeated, and the condition for forcibly terminating the repeat loop. The control unit, when executing the repeat instruction, repeatedly operates the data processing unit in parallel according to the instruction from the start address to the end address unless the forced termination condition is satisfied. In this repeat instruction, the standby state of all data processing units can be set as the condition for forcibly terminating the repetitive loop. In the repeat instruction, a standby state of at least one data processing unit can be set as a condition for forcibly terminating the repetition loop.
前記リピート命令をサポートするデータ処理装置において、前記デー 夕処理部には、 ガロア体の乗算回路と加算回路とを含め、 前記制御部に は、前記ガロア体の乗算回路と加算回路を制御するための演算命令とし て、 ガロア体乗算命令、 ガロア体加算命令、 及びガロア体積和演算命令 を少なく とも実行させることができる。  In the data processing device supporting the repeat instruction, the data processing unit includes a Galois field multiplication circuit and an addition circuit, and the control unit includes a Galois field multiplication circuit and an addition circuit for controlling the Galois field multiplication circuit and the addition circuit. It is possible to cause at least a Galois field multiplication instruction, a Galois field addition instruction, and a Galois volume sum operation instruction to be executed.
ガロア体上で定義された符号の誤り訂正を行うプログラムを格納し たプログラムメモリを更に有するデータ処理装置において、前記制御部 は前記プログラムメモリから命令をフェッチし、前記デ一夕処理部を用 いて誤り訂正処理を行うように構成することができる。 前記誤り訂正処理では、 ガロア体の数によつて定義された符号デ一 ダのシンドローム演算処理、シンドローム演算処理によって得られたシ ンドロームを用いた誤り有無の判定処理、及び誤りが判定されたシンド ロームを前記記憶手段に格納する処理を複数回繰り返し、 その後、 前記 格納されたシンドロームを記憶手段から読み出して誤り訂正演算処理 を行うようにすることができる。 この処理手順に反して、 シンドローム 演算、 誤り有無の判定、 及び誤り訂正演算処理を一つの繰り返しループ として並列演算処理を行なった場合、一部のデータ処理部において誤り がない場合には、 当該デ一夕処理部は、 その一つのループ処理が完了す るまで待機状態を維持しなければならず、無駄なサイクルが頻発すこと も予想される。 これに対し、 本発明の上記処理手順は、 誤りを生じてい るときは、 そのシンドロームを格納し、 格納したシンドロームがある程 度たまったところで、複数のシンドロームに対して前記訂正演算処理を 纏めて行なう。 したがって、 データ処理部の待機状態を全体として短く することが可能になり、待機状態による無駄なサイクルを減らすことが できる。但し、 誤りの判定されたシンドロームを格納する処理が増える から、逆に処理量は増えてしまう場合があることに注意しなければなら ない。 すなわち、 シンドロームを格納する処理が実行されるか否かは、 上記のように誤りが生じているかどうかで決まる。 そのため、 誤りが多 くなれば、 シンドロームの格納処理が多くなり、 処理量が増加してしま うことがある。 In a data processing device further including a program memory storing a program for performing error correction of a code defined on a Galois field, the control unit fetches an instruction from the program memory and uses the data processing unit. It can be configured to perform error correction processing. In the error correction processing, the syndrome calculation processing of the code decoder defined by the number of Galois fields, the determination processing of the presence or absence of an error using the syndrome obtained by the syndrome calculation processing, and the syndrome in which the error is determined are performed. The process of storing the ROHM in the storage unit may be repeated a plurality of times, and thereafter, the stored syndrome may be read from the storage unit to perform an error correction operation process. Contrary to this processing procedure, when the syndrome operation, the determination of the presence or absence of an error, and the error correction operation are performed as a single iteration loop and the parallel operation is performed, if there is no error in some data processing units, the data The overnight processing unit must maintain a standby state until one loop processing is completed, and it is expected that useless cycles will frequently occur. On the other hand, in the processing procedure of the present invention, when an error has occurred, the syndrome is stored, and when the stored syndrome has accumulated to a certain extent, the correction operation processing is collectively performed on a plurality of syndromes. Do. Therefore, the standby state of the data processing unit can be shortened as a whole, and unnecessary cycles due to the standby state can be reduced. However, it must be noted that the number of processes for storing the syndrome for which an error has been determined increases, and conversely the amount of processing may increase. That is, whether or not the processing for storing the syndrome is executed is determined by whether or not an error has occurred as described above. Therefore, if the number of errors increases, the number of syndrome storage processes increases, and the processing amount may increase.
前記データ処理装置を適用したデータ処理システムは、例えばガロア 体の数を用いて定義された符号デ一夕の入力手段と、前記データ処理装 置と、 データの出力手段とを含み、 前記データ処理装置は、 そのプログ ラムメモリに格納されたプログラムに基づいて、前記入力手段から入力 された符号データの誤り訂正を行うものである。 これにより、 誤り訂正 処理の観点などから蓄積系におけるデータ読み出し速度の高速化や通 信系における高速データ伝送に対応できる。 また、 前記データ処理装置 に含まれる制御部には、前記入力手段及び出力手段による入出力制御と、 符号デ一夕の誤り訂正処理とを時分割で実行させることも可能である。 図面の簡単な説明 A data processing system to which the data processing device is applied includes, for example, an input unit for code data defined using the number of Galois fields, the data processing device, and a data output unit. The device corrects the error of the code data input from the input means, based on a program stored in the program memory. This enables error correction From the viewpoint of processing, etc., it is possible to respond to high data reading speed in the storage system and high speed data transmission in the communication system. Also, the control unit included in the data processing device can execute the input / output control by the input unit and the output unit and the error correction process of the code data in a time sharing manner. BRIEF DESCRIPTION OF THE FIGURES
第 1図は本発明の第 1の実施例に係る S I M D型並列プロセッサの プロヅク図、  FIG. 1 is a block diagram of a SIMD parallel processor according to a first embodiment of the present invention,
第 2図は本発明の第 2の実施例に係る S I M D型並列プロセッサの ブロック図、  FIG. 2 is a block diagram of a SIMD parallel processor according to a second embodiment of the present invention,
第 3図は本発明の第 3の実施例に係る S I M D型並列プロセッサの ブロック図、  FIG. 3 is a block diagram of a SIMD parallel processor according to a third embodiment of the present invention,
第 4図は本発明の第 4の実施例に係る S I M D型並列プロセッサの ブロック図、  FIG. 4 is a block diagram of a SIMD parallel processor according to a fourth embodiment of the present invention,
第 5図はリ一ドソロモン符号の誤り訂正の代表的な処理手順を示す フローチャート、  FIG. 5 is a flowchart showing a typical processing procedure for error correction of a lead Solomon code,
第 6図は符号の誤り訂正を本発明の S I M D型並列プロセッサで行 う処理手順の一例を示すフローチャート、  FIG. 6 is a flowchart showing an example of a processing procedure for performing error correction of a code by the SIMD parallel processor of the present invention;
第 7図は符号の誤り訂正を本発明の S I M D型並列プロセッサで行 うより効率的な処理手順の一例を示すフローチャート、  FIG. 7 is a flowchart showing an example of a more efficient processing procedure for performing error correction of a code by the SIMD parallel processor of the present invention;
第 8図は条件判定部の一例を示す論理回路図、  FIG. 8 is a logic circuit diagram showing an example of a condition determining unit,
第 9図はマスク付き比較器の一例を示す論理回路図、  FIG. 9 is a logic circuit diagram showing an example of a comparator with a mask.
第 1 0図はデ一夕処理部の一例を示すプロック図、  FIG. 10 is a block diagram showing an example of a data processing unit,
第 1 1図は第 1 0図のデ一夕処理部においてクロック信号の供給を 停止させて待機状態を実現するようにしたデ一夕処理部の一例を示す プロック図、 第 1 2図は誤り訂正処理に好適なデータ処理部の詳細な一例を示す ブロック図、 FIG. 11 is a block diagram showing an example of a data processing unit in which the supply of a clock signal is stopped in the data processing unit of FIG. 10 to realize a standby state. FIG. 12 is a block diagram showing a detailed example of a data processing unit suitable for error correction processing,
第 1 3図は本発明に係るデータ処理装置の特殊制御命令の説明図、 第 1 4図は本発明に係るデ一夕処理装置のデータ転送命令の説明図、 第 1 5図は本発明に係るデータ処理装置の S I M D命令の説明図、 第 1 6図は一般 R I S C命令、そしてデータ転送命令と S I M D命令 を組み合わせた複合命令の全ての命令コードの構成を示す説明図、 第 1 7図はデータ転送命令の命令コードの一例を示す説明図、 第 1 8図は S I M D命令の命令コ一ドの一例を示す説明図、  FIG. 13 is an explanatory diagram of a special control instruction of the data processing device according to the present invention, FIG. 14 is an explanatory diagram of a data transfer instruction of the data processing device of the present invention, and FIG. FIG. 16 is an explanatory diagram of a SIMD instruction of such a data processing device. FIG. 16 is an explanatory diagram showing the configuration of all instruction codes of a general RISC instruction and a compound instruction combining a data transfer instruction and a SIMD instruction. FIG. 18 is an explanatory diagram showing an example of an instruction code of a transfer instruction. FIG. 18 is an explanatory diagram showing an example of an instruction code of a SIMD instruction.
第 1 9図はデ一夕設定命令を用いてデータ処理部を待機状態にする ための構成を説明するための S I M D型並列プロセッサのプロック図、 第 2 0図は" setPENOP二 1 if"を実行した場合の一例動作タイミングチ ヤート、  Fig. 19 is a block diagram of a SIMD-type parallel processor for explaining the configuration for putting the data processing unit into a standby state by using the data setting instruction. Fig. 20 executes "setPENOP2 1 if". Operation timing chart,
第 2 1図は" setPENOP=l when"を実行した場合の一例動作タイミング チヤ一ト、  FIG. 21 shows an example of operation timing chart when “setPENOP = l when” is executed.
第 2 2図は第 1 3図乃至第 1 5図に示された命令セッ トを用いて作 成したプログラムの一例を示す説明図、  FIG. 22 is an explanatory diagram showing an example of a program created using the instruction set shown in FIGS. 13 to 15;
第 2 3図は S I M D型並列プロセッサを適用した DVD/CD- ROMシステ ムの一例ブロック図、  Figure 23 is a block diagram of an example of a DVD / CD-ROM system to which a SIMD type parallel processor is applied.
第 2 4図は一般的な条件分岐処理を示すフローチャート、  FIG. 24 is a flowchart showing general conditional branch processing.
第 2 5図は本発明者が先に検討した S I M D型並列プロセッサによ る条件分岐処理の一例を示すフローチャートである。 発明を実施するための最良の形態  FIG. 25 is a flowchart showing an example of the conditional branching process by the SIMD type parallel processor discussed earlier by the present inventors. BEST MODE FOR CARRYING OUT THE INVENTION
第 1図には本発明の第 1の実施例に係る S I M D型並列プロセッサ のプロック図が示される。同図に示される S I M D型並列プロセッサは、 第 1及び第 2のメモリ 801,802、 バスィン夕一フェース (バス I/F) 901 を介して接続された周辺回路 900、 制御部 200、 及び複数個並列接続さ れたデ一夕処理部 101,102,...10nを含んで構成される。 同図に示され る CD B , C ABはコモンデ一夕バス, コモンアドレスバスである。 L DB, LABはローカルデ一夕バス, 口一カルアドレスバスである。 同 図に示される S I MD型並列プロセッサは、 周辺回路を除いて、 若しく は周辺回路の全部又は一部を含んで、 1個の半導体基板に形成すること ができる。 なお、 上記メモリ、 特に第 1のメモリの一部又は全部は、 上 記 1個の半導体基板に形成されない場合もある。 FIG. 1 is a block diagram of a SIMD parallel processor according to a first embodiment of the present invention. The SIMD parallel processor shown in FIG. First and second memories 801 and 802, a peripheral circuit 900 connected via a bus interface (bus I / F) 901, a controller 200, and a plurality of parallel data processors 101 and 102. , ... 10n. CD B and C AB shown in the figure are a common data bus and a common address bus. L DB and LAB are local data buses and oral address buses. The SIMD type parallel processor shown in the figure can be formed on one semiconductor substrate including all or a part of the peripheral circuit except for the peripheral circuit. Note that the memory, in particular, a part or all of the first memory may not be formed on the single semiconductor substrate.
第 1のメモリ 801は制御部 200がフェッチして実行すべきプログラム やデータを格納するのに用いられる。制御部 200は第 1のメモリ 801に ァドレスを供給して、 命令もしくはデータを読み出し、 あるいは書き込 みを行う。 第 2のメモリ 802はデ一夕格納用とされ、 制御部 200によつ て演算制御されたデータ処理部 101,102,...10n等との間でデータの入 出力が行われ、 演算対象データをデータ処理部 101,102,...10nに供給 し、 また、 演算途中結果などを格納するのに用いられる。 第 2のメモリ 802は制御部 200からもアクセス可能である。 第 2のメモリ 802にプロ グラムを格納しておき、 バスを介して制御部 200に転送してもよい。 本 実施例に限らず、 本発明で開示する全ての実施例において、 第 1のメモ リ 801と第 2のメモリ 802は共通のァドレス空間の異なったァドレス領 域に割り付けられていると、プログラムとデータのアクセスが同様に行 えるので好ましい。  The first memory 801 is used for storing programs and data to be fetched and executed by the control unit 200. The control unit 200 supplies an address to the first memory 801 to read or write an instruction or data. The second memory 802 is used for storing data overnight, and data is input / output to / from the data processing units 101, 102,..., 10n, etc., which are arithmetically controlled by the control unit 200. The target data is supplied to the data processing units 101, 102,... 10n, and is used for storing the results of the calculation. The second memory 802 is also accessible from the control unit 200. The program may be stored in the second memory 802 and transferred to the control unit 200 via the bus. Not only in this embodiment, but in all embodiments disclosed in the present invention, if the first memory 801 and the second memory 802 are allocated to different address areas in the common address space, the program and the It is preferable because data can be accessed in the same manner.
制御部 200 は、 プログラムカウン夕 (P C) 202、 命令解読部 201、 及びデータ演算部 203から成り、データ演算部 2 0 3によるデータ演算 結果を判定してフラグ 204を設定できるように構成されている。プログ ラムカウン夕 202の値を第 1のメモリ 801に対してァドレスとして出力 することにより、 命令を取り込む。 取り込まれた命令は命令解読部 201 で解読され、 その処理結果に応じてデ一夕処理部 101,102,...10nに対 する制御信号が発行される。 また、 命令解読部 201から出力される制御 信号により、 データ演算部 203 は、 命令に応じたプログラムカウン夕 202の更新、 メモリアクセスとデータ演算を行う。 第 2のメモリ 802の アドレスには制御部 200のデータ演算部 203による演算の結果が用いら れる。 The control unit 200 includes a program counter (PC) 202, an instruction decoding unit 201, and a data operation unit 203, and is configured to determine a data operation result by the data operation unit 203 and set a flag 204. I have. Outputs the value of program count 202 as address to first memory 801 To fetch the instruction. The fetched instruction is decoded by the instruction decoding unit 201, and a control signal is issued to the data processing units 101, 102,... 10n according to the processing result. In addition, in response to the control signal output from the instruction decoding unit 201, the data operation unit 203 updates the program counter 202 according to the instruction, performs memory access, and performs data operation. As the address of the second memory 802, the result of the operation by the data operation unit 203 of the control unit 200 is used.
制御部 200のデータ演算部 203は、 プログラムループの制御、 並列演 算に適さない演算、 メモリのアドレス演算などを行っており、 演算結果 をフラグ 204を介して命令解読部 201にフィードバックしてプログラム フローの制御を行っている。 また、 データ処理部 101,102,...10nの演 算経過と結果を命令解読部 201にフィードバックすることもできる。 デ一夕処理部 101,102,...10nは、 全て同一の構成を有しており、 制御 部 200から同じ制御信号が供給され、 原則として同じ動作をする。 デー 夕は主に第 2のメモリ 802から供給されるが、 第 1のメモリ 801、 バス イン夕フェース回路 901を介して周辺回路 900から、 また、 制御部 200 のデータ演算回路 203 から も転送できる。 また、 データ処理部 101,102,...10nから、 逆に第 1のメモリ 801又は第 2のメモリ 802、 周 辺回路 900、 制御部 200のデータ演算回路 203へもデ一夕転送が可能で ある。データ処理部 101, 102, ...10ηが例えば 4個並列に接続され、夫々 が扱うデータを 8ビッ トと仮定すると、メモリは 3 2ビッ トのデ一夕を 一括して読み出すことができるように構成されており、 8ビッ ト毎のデ 一夕をデータ処理部 101, 102,...10ηに一括して供給することができる。 前記夫々のデータ処理部 101,102, ...10ηは、 制御部 200からの前記 制御情報(制御信号/命令) に従った演算部 161,162,..·16ηによる演算 結果に応じてデータ処理部 101, 102,...10ηを待機状態にする待機制御 手段 と して、 例え ば判定部 lll,112,...lln、 待機 レ ジス 夕 121, 122,...12η, 待機/活性切換え部 (enb/dis) 131, 132, ...13ηを有 する。 当該待機制御手段は、 前記制御部 200からの指示に従って前記デ 一夕処理部 101, 102,...10ηの内部を待機状態から活性状態に復帰させ る。第 1図において、前記待機制御手段は、データ処理部 101,102, ...10η による演算結果が特定の状態になったか否かを判定する判定部 111(112,...11η)と、 前記判定部による前記特定の状態の検出に同期し てセッ ト状態 (第 1の状態) にされ、 前記制御部 200からの特定の制御 情報によってリセッ ト状態 (第 2 の状態) にされる待機レジス夕 121(122,...12η)と、 前記待機レジス夕のセッ ト状態に呼応してデータ 処理部 101(102,...10η)による演算動作を停止させる待機/活性切換え 部 131(132,...13η)とによって構成される。 The data operation unit 203 of the control unit 200 performs program loop control, operation not suitable for parallel operation, memory address operation, and the like.The operation result is fed back to the instruction decoding unit 201 via the flag 204 and the program is executed. The flow is being controlled. Further, the operation progress and results of the data processing units 101, 102,... 10n can be fed back to the instruction decoding unit 201. The data processing units 101, 102,..., 10n all have the same configuration, are supplied with the same control signal from the control unit 200, and perform the same operation in principle. Data is mainly supplied from the second memory 802, but can also be transferred from the first memory 801 and the peripheral circuit 900 via the bus interface circuit 901 and from the data arithmetic circuit 203 of the control unit 200. . In addition, data transfer can be performed from the data processing units 101, 102,... 10n to the first memory 801 or the second memory 802, the peripheral circuit 900, and the data operation circuit 203 of the control unit 200. . Assuming that four data processing units 101, 102, ... 10η are connected in parallel, for example, and that the data handled by each is 8 bits, the memory can read 32 bits of data at once. The configuration is such that data of every 8 bits can be supplied collectively to the data processing units 101, 102,... 10η. Each of the data processing units 101, 102,..., 10η is provided with a data processing unit 101, in accordance with a calculation result by the calculation units 161, 162,... 16η according to the control information (control signal / command) from the control unit 200. Standby control to put 102, ... 10η in standby state As means, for example, the judgment units lll, 112, ... lln, the standby registers 121, 122, ... 12η, the standby / active switching units (enb / dis) 131, 132, ... 13η Yes. The standby control unit returns the inside of the data processing units 101, 102,... 10η from the standby state to the active state according to an instruction from the control unit 200. In FIG. 1, the standby control means includes: a determination unit 111 (112,... 11η) for determining whether a calculation result by the data processing units 101, 102,. The standby register is set to a set state (first state) in synchronization with the detection of the specific state by the control unit, and is set to a reset state (second state) by specific control information from the control unit 200. 121 (122,... 12η), and a standby / active switching unit 131 (132, 132) for stopping the arithmetic operation by the data processing unit 101 (102,. ... 13η).
これによりデータ処理部 101,102, ...10ηは、 それそれの演算結果に 応じて独自に待機状態に入ることができる。 待機状態では、 制御部 200 からの制御信号を無視して一切の演算が行われない。 このとき、 無駄な 電力を消費しないようにュニッ ト内へのクロック信号の供給を停止す ることが望ましい。  This allows the data processing units 101, 102,..., 10η to independently enter the standby state according to the calculation results. In the standby state, no operation is performed ignoring the control signal from the control unit 200. At this time, it is desirable to stop the supply of the clock signal into the unit so as not to waste power.
待機/活性の状態を示す情報は、各々のデータ処理部 101, 102,...10η において、 待機レジス夕 121, 122, ...12ηで保持されている。 この待機 レジス夕 121, 122, ...12ηの内容は制御部 200から夫々独立に読み出し、 セッ ト/リセッ トができる。 制御部 200ではレジス夕 121, 122,...12η の状態を監視することによって、プログラムフローに無駄が生じないよ うに制御することができる。  Information indicating the standby / active status is held in the standby registers 121, 122,... 12η in the respective data processing units 101, 102,. The contents of the standby registers 121, 122,..., 12η are independently read from the control unit 200, and can be set / reset. The control section 200 can monitor the status of the registers 121, 122,.
第 2図には本発明の第 2の実施例に係る S IMD型並列プロセッサ が示される。第 1図に示されるものと同一機能を有する回路プロックに はそれらと同じ参照符号を付してある。 S I MD型並列プロセッサでは、 デ一夕処理部を並列動作させることによって演算能力が高められてい る一方、デ一夕の転送速度がプロセッサの処理性能を律速する問題がし ばしば発生する。 例えば、 夫々のデータ処理部 101, 102,...10ηにおけ る演算途中結果をその都度メモリ 802に格納する場合、メモリ 802の並 列データ入出力ビッ ト数が全てのデータ処理部 101,102,..·10ηに対し て並列的にデータ入出力を行なえるビッ ト数でなければ、メモリァクセ スを複数回に分けなければならない。必要な並列デ一夕入出力ビッ ト数 が全体として少ない場合は全てのデータ処理部に対して一括してデ一 夕入出力を行なうことが可能であるが、演算処理単位のデータビッ ト数 が 64 ビッ トなどのように多い場合、 或いはデ一夕処理部の数が多い場 合には、 メモリ 802 と全てのデ一夕処理部 101,102,...10ηとの間での 一括的な並列データ入出力は実質的に不可能になる。これを解決するた めに、 本実施例では、 各々のデータ処理部 101,102,...10ηに複数ヮー ド (データ処理単位とされるデータビッ ト数)のデータを記憶する記憶 手段 141,142,...14ηを設ける。 FIG. 2 shows an SIMD type parallel processor according to a second embodiment of the present invention. Circuit blocks having the same functions as those shown in FIG. 1 are denoted by the same reference numerals. In the SI MD type parallel processor, While the computing power is enhanced by operating the data processing units in parallel, the problem that the transfer speed of data processing determines the processing performance of the processor often occurs. For example, when the intermediate processing results in the respective data processing units 101, 102,..., 10η are stored in the memory 802 each time, the number of parallel data input / output bits of the memory 802 is equal to all the data processing units 101, 102,. If the number of bits that can be used for data input / output in parallel with 102,... 10η is not enough, the memory access must be divided into multiple times. If the required number of parallel data input / output bits is small as a whole, data input / output can be performed collectively for all data processing units. If the number is large, such as 64 bits, or if the number of data processing units is large, batch parallel processing between memory 802 and all data processing units 101, 102, ... 10η Data input / output is virtually impossible. In order to solve this, in the present embodiment, the storage means 141, 142, which store a plurality of data (the number of data bits used as a data processing unit) in each of the data processing units 101, 102,. ... 14η is provided.
データ記憶手段 141,142,...14ηは、 個々のデータ処理部の所謂ヮー ク領域として利用され、通常のアプリケーシヨンでは数十ヮード必要で あるが、 これを 1ワード毎に独立に命令語で指定しょうとすると、 1ヮ ード当り 5から 7ビッ 卜の命令コードが必要になる。本実施例において は、 記憶手段 141, 142,...14ηの記憶領域をヮ一ド単位で指定するボイ ン夕 151,152,...15ηを複数個設け、ボイン夕 151, 152, ...15ηで指定さ れたヮ一ドを命令のオペランドとして用いる。 この場合、 ボイン夕の数 が 4個までならば、 命令コードは 2ビッ 卜で指定できる。更に好適な実 施例によれば、 ポインタ 151, 152,...15ηの値は演算命令、 データ転送 命令などと同時に更新されるよう に設計されている。 ポイ ンタ 151, 152,...15η 毎にインクリメン夕/デクリメン夕を設け、 命令から ボイン夕 151,152, ...15ηの更新を指示する制御を加える。 デ一夕記憶 手段 141,142,...14ηのヮード数、 ポインタ 151, 152, ...15ηの数は、 SThe data storage means 141, 142,..., 14η are used as so-called peak areas of the individual data processing units, and in ordinary applications, several tens of words are required. Would require 5 to 7 bits of instruction code per code. In the present embodiment, a plurality of boys 151, 152,... 15η for specifying the storage areas of the storage means 141, 142,. The code specified by ..15η is used as the operand of the instruction. In this case, the instruction code can be specified in 2 bits if the number of buses is up to four. According to a further preferred embodiment, the values of the pointers 151, 152,... 15η are designed to be updated at the same time as an operation instruction, a data transfer instruction, and the like. Pointers 151, 152, ... 15η Increment / decrement evenings are set for each Add control to instruct update of 151,152, ... 15η. Data storage means 141, 142,... 14η, the number of pointers 151, 152,.
I MD型並列プロセッサの適用範囲、 命令コード、 回路規模、 処理性能 を勘案して最適な数値を選定すればよく、ここで示した数値は一例に過 ぎない。 Optimum numerical values should be selected in consideration of the application range, instruction code, circuit scale, and processing performance of the MD parallel processor, and the numerical values shown here are just examples.
ボイン夕 151,152,...15ηの値を更新する際に、 上限値と下限値を設 け、 その範囲を越えないように、 自動的にその範囲内でポインタの値を 更新する手段を備えれば、いくつかのデ一夕を繰り返し用いる演算が容 易になる。 このような演算の例として、 ディジタルフィル夕がある。 入 カデ一夕とフィル夕の係数を本実施例の記憶手段 141, 142,...14ηに格 納しておきこの範囲を越えないように制御することによって、ディジ夕 ルフィル夕のプログラムにおいて、デ一夕と係数の格納されているァド レスを外れたアクセスがなくなり、 ボイン夕値を監視する処理を、 制御 部 200で実行するソフ トウェアに記述する必要がなくなる。  When updating the value of 151, 152, ... 15η, set an upper limit value and a lower limit value, and provide a means to automatically update the pointer value within the range so as not to exceed the range. If it is prepared, it will be easy to perform operations that repeatedly use some data. An example of such an operation is a digital filter. By storing the coefficients of input and output in storage means 141, 142, ... 14η of this embodiment and controlling them so as not to exceed this range, the program There is no access outside the address where the coefficient is stored and the coefficient is stored, so that it is not necessary to describe the process of monitoring the value of the boyfriend in software executed by the control unit 200.
更に好適な実施例においては、 ポインタ 151, 152,...15ηの値が所定 の値に達した場合にそれを検知する機構を設けてある。 ポイ ンタ 151,152, ...15η の終了値を保持する レジスタ を設け、 ポイ ンタ 151,152, ...15ηの値が更新される度に終了値との比較を行い、 比較結 果をフラグ 204に反映する。  In a further preferred embodiment, a mechanism is provided for detecting when the value of the pointers 151, 152,... 15η reaches a predetermined value. Registers that hold the end values of pointers 151, 152, ... 15η are provided, and each time the values of pointers 151, 152, ... 15η are updated, they are compared with the end values, and the comparison result is reflected in flag 204. I do.
第 3図には本発明の第 3の実施例に係る S I MD型並列プロセッサ が示される。本実施例は、 第 1の実施例と第 2の実施例を組み合わせた ものであり、第 1図及び第 2図に示されるたものと同一機能を有する回 路ブロックにはそれと同じ参照符号を付してある。第 2の実施例の説明 で開示したようにボイン夕 151,152, ...15ηの値が終了値に達した場合 など、 ポインタ 151, 152,...15ηの値の更新結果も待機状態への移行の 条件に加えた判定部 171,172,..·17ηが採用されている。 これは、 記憶 手段 141 , 142 , . . . 14nに格納されていて、 それそれに格納されているデ 一夕総数が相違し若しくは分からないデータ群の全てに対する処理が 完了するまで、 処理を行う場合に好適である。 すなわち、 デ一夕群のデ 一夕の数が、 データ処理部 101, 102, . . . 10η毎に相異なる場合、 処理が 完了したデータ処理部から順次待機状態になっていくので、並列性を保 つたままで処理を行うことができる。 FIG. 3 shows a SIMD type parallel processor according to a third embodiment of the present invention. This embodiment is a combination of the first and second embodiments. Circuit blocks having the same functions as those shown in FIGS. 1 and 2 are denoted by the same reference numerals. It is attached. As disclosed in the description of the second embodiment, when the values of the pointers 151, 152,... 15η reach the end values, the update results of the pointers 151, 152,. Judgment units 171, 172,... 17η in addition to the transition conditions are adopted. This is a memory Means 141, 142,... 14n, and is suitable for performing processing until processing for all data groups in which the total number of data stored therein is different or unknown is completed. . That is, if the number of data sets in the data set is different for each of the data processing units 101, 102,... 10η, the data processing units that have completed processing sequentially enter a standby state. The processing can be performed while maintaining.
第 4図には前記第 3の実施例の更に好適な実施形態である、第 4の実 施例に係る S I M D型並列プロセッサが示されている。本実施例におい ては、 各データ処理部 101,102,. . . 10ηの待機状態を判定するための条 件判定部 206が制御部 200に明示され、 また、 制御部 200は、 前記条件 判定部 206の判定結果を利用するリピート制御部 205を有する。その余 の構成は、 第 3図の実施例と同じである。  FIG. 4 shows a SIMD parallel processor according to a fourth embodiment, which is a further preferred embodiment of the third embodiment. In the present embodiment, a condition determining unit 206 for determining the standby state of each of the data processing units 101, 102,... 10η is explicitly shown in the control unit 200, and the control unit 200 A repeat control unit 205 that uses the determination result of the unit 206 is provided. The rest of the configuration is the same as the embodiment of FIG.
前記夫々のデータ処理部 101, 102, . . . 10ηの待機状態は、信号 PEN0PEA によ って前記条件判定部 206 が監視する。 各データ処理部 101 , 102, . . . 10η の待機状態が所定の状態になったとき、 その状態を示 す情報がフラグ 204に反映され、 また、 リピート制御部 205に与えられ る。  The standby state of each of the data processing units 101, 102,... 10η is monitored by the condition determination unit 206 using a signal PEN0PEA. When the standby state of each of the data processing units 101, 102,... 10η becomes a predetermined state, information indicating the state is reflected on the flag 204, and is given to the repeat control unit 205.
前記リピート制御回路 205は、プログラムの繰り返し開始ァドレスと 繰り返し終了ァドレス、繰り返し回数を保持するハードウエアを備えて いて、 これらに値を与えるリピート命令が発行されると、 繰り返しのた めのオーバーへッ ドなしにリピートル一プを実行する。例えば、 リピ一 ト制御部 205は、 夫々図示を省略するが、 繰り返しループの開始ァドレ スを格納するレジス夕と、 終了アドレスを格納するレジス夕と、繰り返 し回数を格納するレジス夕と、繰り返し回数を計数する計数手段とを備 え、プログラムカウン夕 202を介して開始ァドレスから終了ァドレスの 叩令を実行させる繰り返しループ (リピートル一プ) を形成する。 リピ —トループの繰り返し回数は、設定された繰り返し回数によって規定さ れる。 また、 繰り返しループ処理の強制終了条件が指定されている場合 には、その条件が成立することによってリピ一トループが強制終了され る。 強制終了条件の一つとして、 前記デ一夕処理部 101 , 102, . . . 10ηの 待機状態が予め設定された条件を満たすかどうかの前記条件判定部 206 での検出結果を利用することができる。 The repeat control circuit 205 includes hardware for holding a program repetition start address, a program repetition end address, and the number of repetitions. When a repeat command for giving a value to these is issued, an overflow for repetition is performed. Perform a repeat loop without any For example, although not shown, the repeat control unit 205 includes a register for storing a start address of a repetitive loop, a register for storing an end address, and a register for storing the number of repetitions. A counting means for counting the number of repetitions is provided, and a repetition loop (repeat loop) for executing a command from the start address to the end address through the program counter 202 is formed. Lipi —The number of repetitions of the loop is defined by the set number of repetitions. If the condition for forced termination of the repetitive loop processing is specified, the repeat loop is forcibly terminated when the condition is satisfied. As one of the forced termination conditions, it is possible to use the detection result of the condition determination unit 206 as to whether the standby state of the data processing units 101, 102,... 10η satisfies a preset condition. it can.
このように、 個々のデータ処理部 101,102,. . . 10nの待機状態を監視 する信号線 PEN0PEA が、 所定の状態になった場合、 その事象を制御部 200のフラグ 204に反映してそのフラグ 204に従った条件分岐を実行で き、 また、 上記のリビートループを強制終了させることができる。 更に、 データ処理部 101,102,. . . 10nが並列演算処理を実行しているとき、 処 理が完了したデ一夕処理部から順次待機状態とすることができ、デ一夕 処理部 101,102,. . . 10nが待機状態になった場合に、 制御部 200がそれ を検知し、 プログラムの分岐、 若しくはリビ一トループの強制終了を行 えるので、 処理ステップに無駄が生じないという効果がある。  As described above, when the signal line PEN0PEA for monitoring the standby state of each of the data processing units 101, 102,... 10n enters a predetermined state, the event is reflected in the flag 204 of the control unit 200 and the corresponding state is reflected. A conditional branch according to the flag 204 can be executed, and the above-mentioned re-beat loop can be forcibly terminated. Further, when the data processing units 101, 102,... 10n are executing the parallel operation processing, the data processing units 101, 102,... , 102,... 10n enters the standby state, the control unit 200 detects it and can forcibly terminate the program branch or the retry loop, so that there is no waste in the processing steps. There is.
以上第 1乃至第 4の実施例で開示した構成のデータ処理装置は、デ一 夕処理部 101 , 102,. . . 10nにガロア体演算器を備えることにより、 誤り 訂正符号、 特にリードソロモン符号の誤り訂正処理に最適となる。  The data processing apparatus having the configuration disclosed in the first to fourth embodiments includes an error correction code, particularly a Reed-Solomon code, by providing a Galois field arithmetic unit in the data processing units 101, 102,. This is optimal for error correction processing.
以下、 上記第 4実施例に係るデータ処理装置を、 リードソロモン符号 の誤り訂正に最適化した実施例について更に詳述する。  Hereinafter, an embodiment in which the data processing apparatus according to the fourth embodiment is optimized for error correction of a Reed-Solomon code will be described in further detail.
先ず、 リードソロモン符号の誤り訂正の代表的な処理フローを第 5図 に基づいて説明する。 第 5図において、 長円は処理内容を示し、 長方形 は出力データを示す。 誤り訂正の処理フローは、 データ転送 1001、 シ ンドローム計算 1002、 誤りの有無判定 1003、 ユークリッ ド互除法 1004、 チェンサーチ 1005、 誤り数値計算 1006、 訂正 1007からなる。 デ一夕転 送 1001は、 前記第 1のメモリ 801から第 2のメモリ 802へのデ一夕転 送で、 デ一夕を S I M D型の並列処理に適した形式で第 2のメモリ 802 に配列する。 勿論、 別の方法として入出力装置からデータ処理部 101 , 102, . . . 10η にデータ転送することも可能である。 シンドローム計 算 1002は、 一連の受信符号 ( r Oから r 2 5 5 ) 2001を入力とし、 シ ンドローム多項式の係数 2002を算出する。ここで、一連の受信符号( r 0〜r 2 5 5 ) は例えば 2 5 6バイ トのデ一夕であり、 符号化されたデ 一夕とそれに対応されるパリティー情報とから成る。シンドローム計算 は前記一連の受信符号 ( r 0 ~ r 2 5 5 ) 単位で行われる。 シンドロ一 ム多項式の係数 2002が、 全てゼロであれば受信符号に誤りがないこと が分かる。誤りがないことが分かった場合は以下の処理を省略して終了 し、 誤りがあることが分かった場合は、 訂正処理を始める。 最初にユー クリッ ド互除法 1004により、 シンドローム多項式 2002から誤り位置多 項式 2003 と誤り数値多項式 2004を算出する。 誤り位置多項式 2003の 根を、チェンサーチ 1005で求めることによって誤りの位置 2005が求め られる。 このとき、 誤りの位置 2005が実際にあり得ないような値で求 められた場合には、符号の訂正能力を上回る誤りが発生していることが 分かる。 このときは、 訂正不能であることを出力し、 以下の処理を省略 して終了する。誤りの位置が適切に求められた場合には、 それを基にし て、 誤りの数値 2006を計算し、 訂正 1007を行って処理を終了する。 誤り訂正処理では、一連の受信符号と別の受信符号の間に誤り訂正の 過程で全く相互関係がなく、したがってデータのやり取りを行う必要が ないため、 S I M D型の並列処理に極めて適している。 ガロア体演算器 を備えた一つのデータ処理部で一連の受信符号の訂正を実施するのが 最も効率が良い。本実施例において、 データ処理部が n個並列に接続さ れているとすると、 n個の受信符号系列が同時並行に処理される。 First, a typical processing flow for error correction of a Reed-Solomon code will be described with reference to FIG. In FIG. 5, the oval indicates the processing content, and the rectangle indicates the output data. The error correction processing flow consists of data transfer 1001, syndrome calculation 1002, error determination 1003, Euclidean algorithm 1004, Chien search 1005, error numerical calculation 1006, and correction 1007. The data transfer 1001 is a data transfer from the first memory 801 to the second memory 802. The data is arranged in the second memory 802 in a format suitable for SIMD-type parallel processing. Of course, as another method, it is also possible to transfer data from the input / output device to the data processing units 101, 102,... 10η. The syndrome calculation 1002 receives a series of received codes (from r O to r 255) 2001 as input and calculates a coefficient 2002 of a syndrome polynomial. Here, the series of received codes (r0 to r255) is, for example, 256 bytes of data, and includes encoded data and parity information corresponding thereto. The syndrome calculation is performed in units of the series of received codes (r0 to r255). If the coefficients 2002 of the syndrome polynomial are all zero, it is understood that there is no error in the received code. If it is found that there is no error, omit the following processing and terminate. If it is found that there is an error, start the correction processing. First, an error locator polynomial 2003 and an error numerical polynomial 2004 are calculated from the syndrome polynomial 2002 by the Euclidean algorithm 1004. The location of the error 2005 is obtained by obtaining the root of the error locator polynomial 2003 by the Chien search 1005. At this time, if the error position 2005 is obtained with a value that cannot actually be obtained, it is understood that an error has occurred that exceeds the code correction capability. At this time, it outputs that the correction is not possible, and skips the following processing and ends. If the position of the error is properly obtained, the error value 2006 is calculated based on the position, the correction 1007 is performed, and the processing is terminated. In error correction processing, there is no correlation between a series of received codes and another received code during the error correction process, and there is no need to exchange data. Therefore, it is extremely suitable for SIMD-type parallel processing. It is most efficient to perform a series of corrections on the received code with one data processing unit equipped with a Galois field arithmetic unit. In this embodiment, assuming that n data processing units are connected in parallel, n received code sequences are processed in parallel.
S I M D型の並列プロセッサ構成を採った最大の効果は、データ処理 部の数を変えることによって、プログラムの変更なしに処理性能を変更 することができることである。プログラムは符号の規格が変更にならな いかぎり同じでよく、高速アクセスに対してはデ一夕処理部の数を増や すことで対応できるので、 設計変更が極めて容易である。 The biggest advantage of using a SIMD parallel processor configuration is data processing. By changing the number of copies, the processing performance can be changed without changing the program. The program can be the same as long as the code standard is not changed, and high-speed access can be handled by increasing the number of data processing units, making design changes extremely easy.
ここで一例として、 4個のデータ処理部 101, 102,103, 104を用いて、 Here, as an example, using four data processing units 101, 102, 103, and 104,
4個の受信符号系列 2001 を並列に処理する場合の処理フローを第 6図 及び第 7図に摸式的に示す。 第 6図及び第 7図において、 GPEO(lOl)か ら GPE3(104)は並列の 4個のデータ処理部を示し、 縦方向に時間の経過 と共に、各々のデータ処理部 101,102,103,104で実行される処理を示し てある。 FIGS. 6 and 7 schematically show a processing flow when the four reception code sequences 2001 are processed in parallel. In FIGS. 6 and 7, GPEO (lOl) to GPE3 (104) show four parallel data processing units, and in the vertical direction, each of the data processing units 101, 102, 103, 104 The processing to be performed is shown.
第 6図においては、 4個入力される受信符号系列 2001のうち 1個で も誤りが検出されれば、誤りの検出されなかったデ一夕処理部を待機状 態 1099にして、 訂正までの処理を実行する。 第 6図は、 GPEO(lOl)に入 力された受信符号系列 2001 には訂正できる範囲内の誤りが発生し、 GPE 102)に入力された受信符号系列 2001 には訂正できる範囲を越え た誤りが発生し、 GPE2(103)及び GPE3U04)に入力された受信符号系列 2001 には誤りが発生しなかった場合を一例としている。 デ一夕転送 1001、 シンドローム計算 1002、 誤り有無の判定 1003、 までは全てのデ —夕処理部 101,102, 103,104を無条件に並列動作させる。この例では、 誤りがあることが分かった GPE0(101)と GPE 102)では、 後続の誤り訂 正処理が行われる、 一方、 GPE2(103)と GPE3(104)で誤りがなかったこ とが検出されるので、 GPE2(103)と GPE3U04)はその間、 待機状態 1099 になる。 ユークリッ ド互除法 1004、 誤り数値計算処理 1006の結果、 符 号の訂正能力を越えた誤りが発生していることが判明した GPE1( 102)は、 以後の誤り数値計算 1006、 訂正 1007 の間、 待機状態 1099 になって GPE0( 101)の処理が終了するのを待つ。 この場合、すべてのデータ処理部 101 , 102 , 103 , 104で誤りがないこと が検出されれば、 誤り訂正処理を行わないで、 次の一連の受信符号語 2001 の処理に移行できるので、 不要な処理ステップの浪費を回避でき る。 しかし、 4個のデ一夕処理部 101 , 102,103,104で同時に処理される 4個の受信符号系列 2001のうち 1個でも誤りがあれば、 実際に誤り訂 正を行う 1個のデ一夕処理部以外の 3個のデータ処理部は、ユークリッ ド互除法 1004以後の処理全てで待機状態 1099になって、誤り訂正処理 の完了を待つことになる。 これでは、 並列に設けたデータ処理部 101 , 102 , 103,104を、 有効に利用しているとはいえない。 デ一夕処理部 の数を増やすにつれて、全てのデータ処理部で誤りが検出されない確率 は低下し、 処理の効率は低くなる。 In FIG. 6, if an error is detected in at least one of the four input received code sequences 2001, the data processing unit in which no error is detected is set to the standby state 1099, and the time until the correction is completed. Execute the process. Fig. 6 shows that the received code sequence 2001 input to GPEO (lOl) has an error within the correctable range, and the received code sequence 2001 input to GPE 102) has an error beyond the correctable range. Is generated, and no error occurs in the received code sequence 2001 input to the GPE2 (103) and the GPE3U04). All data processing units 101, 102, 103, 104 are unconditionally operated in parallel until data transfer 1001, syndrome calculation 1002, and error determination 1003. In this example, GPE0 (101) and GPE102), which were found to have errors, perform subsequent error correction processing, while GPE2 (103) and GPE3 (104) detect that no errors were found. GPE2 (103) and GPE3U04) will be in the waiting state 1099 during that time. As a result of the Euclidean algorithm 1004 and the error numerical calculation processing 1006, it was found that an error exceeding the code correction capability occurred, and GPE1 (102) performed the following error numerical calculation 1006 and correction 1007. Wait for the GPE0 (101) to end in the standby state 1099. In this case, if all the data processing units 101, 102, 103, and 104 detect that there is no error, the processing can proceed to the next series of received codewords 2001 without performing error correction processing. Wasteful processing steps can be avoided. However, if at least one of the four received code sequences 2001 simultaneously processed by the four data processing units 101, 102, 103, and 104 has an error, one data for which error correction is actually performed is performed. The three data processing units other than the overnight processing unit enter the standby state 1099 in all the processing after the Euclidean algorithm 1004 and wait for the completion of the error correction processing. In this case, it cannot be said that the data processing units 101, 102, 103 and 104 provided in parallel are effectively used. As the number of data processing units increases, the probability that no error is detected in all data processing units decreases, and processing efficiency decreases.
これを回避する方法として、第 7図に示す処理フローが有効である。 この場合も最初は、 4個の受信符号系列 2001 を 4個のデータ処理部 101 , 102, 103, 104 に読み込んで( 1001 )、 シンドローム計算 1002 と誤り 有無の判定 1003 を行うところまでは、 第 6図に示した処理フローと同 様である。 誤りがあることが判明した GPEO ( lOl )と GPE1 U02 )では算出 されたシンドローム 2002をメモリ 802に格納し、 その際、 誤りのない ことが判明した GPE2( 103 )と GPE3 ( 104 )は待機状態 1099になってその処 理の終了を待つ。処理フローは先頭のデ一夕転送に戻って、 次の 4個の 受信符号系列 2001を 4個のデータ処理部 101, 102, 103, 104に読み込み 1001、 さらにシンドローム計算 1002と誤り有無の判定 1003を行って、 同様に誤りの検出された受信符号系列 2001のシンドローム 2002のみを メモリ 802に格納 1008する。 これをある程度のまとまった数の受信符 号系列 2001に対して行った後、誤りの検出された受信符号系列 2001に ついてのみ、 メモリ 802に格納しておいたシンドローム 2002を読み込 んで(1009 )、 以降の誤り訂正処理を行う。 この場合も、 誤りの数が符号 の訂正能力を越えた場合は訂正不能であるので、 誤り数値計算 1006 を 行わず、 待機状態 1099になる。 第 7図に示した例では、 誤りの検出さ れた受信符号系列 2001のシンドローム 2002を、 一旦メモリ 802に格納 して(1008 )、 再度読み出す処理(1009 )が、 第 6図に示した例に比べて余 分に必要になる。一般に、 誤りの発生する頻度が高い場合は第 6図に示 した処理フローが効率的であり、誤りの発生する頻度が低い場合は第 7 図に示した処理フローが効率的である。誤りの発生する頻度と、 誤り訂 正に要する処理ステツプ数とシンドロームを一旦メモリに格納するの に要する処理ステップ数との関係から、どちらの処理フローがより効率 的か定量的に判断できる。 As a method for avoiding this, the processing flow shown in FIG. 7 is effective. In this case as well, at first, the four received code sequences 2001 are read into the four data processing units 101, 102, 103, and 104 (1001), and the syndrome calculation 1002 and the error determination 1003 are performed. This is the same as the processing flow shown in Fig. 6. In GPEO (lOl) and GPE1 U02) found to be incorrect, the calculated syndrome 2002 is stored in the memory 802, and GPE2 (103) and GPE3 (104) found to be free of errors are in the standby state. It becomes 1099 and waits for the end of the processing. The processing flow returns to the first data transfer, and the next four received code sequences 2001 are read into the four data processing units 101, 102, 103, and 104. Further, the syndrome calculation 1002 and the determination of the presence or absence of an error 1003 Similarly, only the syndrome 2002 of the received code sequence 2001 in which an error is detected is stored 1008 in the memory 802. After this is performed for a certain number of received code sequences 2001, the syndrome 2002 stored in the memory 802 is read only for the received code sequence 2001 in which an error is detected (1009). , Perform the following error correction processing. Again, the number of errors is If the error exceeds the correction capability, it is impossible to correct the error. In the example shown in FIG. 7, the process of temporarily storing the syndrome 2002 of the received code sequence 2001 in which the error was detected in the memory 802 (1008) and reading it out again (1009) is the same as the example shown in FIG. It will be needed more than in. In general, the processing flow shown in Fig. 6 is efficient when the frequency of errors is high, and the processing flow shown in Fig. 7 is efficient when the frequency of errors is low. From the relationship between the frequency of occurrence of errors and the number of processing steps required for error correction and the number of processing steps required to temporarily store the syndrome in memory, it is possible to quantitatively determine which processing flow is more efficient.
特に実用の誤り訂正の場合には、 誤りの発生する頻度の平均値は、 想 定される最も多くの誤りが発生する頻度よりも、極端に低い。受信符号 が 2 5 6バイ トでそのうちの 8個以下の誤りを訂正する場合には、シン ドロームは 1 6バイ ト必要となる。誤りの発生する確率は、 典型的な例 で 1, 0 0 0分の 1であるので、 平均的に約 4個の符号語に 1語の誤り が発生することになる。 1 0 0個の受信符号語 = 2 5, 6 0 0バイ トの 訂正を行う場合、 訂正処理が 1 0, 0 0 0ステツプで完了すると仮定す れば、 第 6図に示した処理フローによれば、 1 0,0 0 0 x 1 0 0 + 4 = 2 5 0 , 0 0 0ステップになる。 一方、 第 7図の処理フローにしたが つて誤り訂正を行った場合の処理ステップ数を見積る。誤りの発生する 確率を 1, 0 0 0分の 1、 シンドローム計算を行って誤りの有無を判定 するまでに 2 , 0 0 0ステップ、 シンドロームの値の格納、 再読み出し にそれそれ 1 6ステヅプかかると仮定する。 このとき、 1 0 0個の受信 符号語 = 2 5 , 6 0 0バイ トの 1, 0 0 0分の 1に誤りが発生している ので、 誤りは平均 2 5 . 6バイ ト、 したがって 1 0 0語の受信符号語の うち 2 0語程度に誤りが発生している。 1 0 0個の受信符号語のシンド ローム計算に 2,000 x 1 00 + 4 = 50,000ステップ、誤りの発 生した 20語のシンドロームを一旦メモリに格納し、全ての符号語の誤 り検出が完了した後、 メモリから再読み出しすると、 1 6 x 2 x 20 = 640ステップ、 20語の誤り訂正に 8, 000 x 20 + 4 = 40 , 00 0ステップ、 合計 90, 640ステップとなる。 第 6図に示した処理フ 口一の場合の半分以下で処理が完了することになる。誤りの発生率がよ り低い場合、 また、 デ一夕処理部の数を増やして並列度を増した場合は、 第 7図の処理フローを採用した方が処理ステツプ数の低減効果はより 顕著になる。 In particular, in the case of practical error correction, the average value of the frequency of occurrence of errors is extremely lower than the frequency of occurrence of the most supposed errors. If the received code is 256 bytes and eight or less errors are corrected, 16 syndromes are required. Since the probability of an error occurring is typically 1 in 1000, an average of about four codewords will have one word error. When correcting 100 received codewords = 25,600 bytes, assuming that the correction process is completed in 100,000 steps, the processing flow shown in FIG. According to this, there are 10 0, 0 0 0 x 10 0 + 4 = 25 0, 0 0 0 steps. On the other hand, the number of processing steps when error correction is performed according to the processing flow in FIG. 7 is estimated. The probability of occurrence of an error is 1/100, and it takes 2 and 0000 steps to calculate the syndrome and determine whether or not there is an error. It takes 16 steps to store and re-read the value of the syndrome. Assume that At this time, since an error occurs in 1 / 100th of 100 received codewords = 25,600 bytes, the average error is 25.6 bytes, and therefore 1 An error has occurred in about 20 of the received codewords. Syndication of 100 received codewords 2,000 x 100 + 4 = 50,000 steps in ROHM calculation, 20 syndromes in which an error has occurred are temporarily stored in memory, and after all codewords have been detected in error, they are read out again from memory. Then, 16 x 2 x 20 = 640 steps, 8,000 x 20 + 4 = 40, 000 steps for error correction of 20 words, for a total of 90, 640 steps. The processing is completed in less than half of the processing flow shown in FIG. If the error rate is lower, or if the parallelism is increased by increasing the number of processing units, the effect of reducing the number of processing steps is more remarkable when the processing flow in Fig. 7 is adopted. become.
以上のように、 第 7図に示した処理フローによれば、極めて効率的に、 すなわち極めて少ない実行ステップ数で、 誤り訂正を実行できる。その ような第 7図の処理手順を実現するためのプログラムは前記メモリ 801 に格納されている。 S I MD型プロセッサの制御プログラムは、 外付け メモリに格納することも可能である。  As described above, according to the processing flow shown in FIG. 7, it is possible to execute error correction extremely efficiently, that is, with an extremely small number of execution steps. A program for implementing the processing procedure of FIG. 7 is stored in the memory 801. The control program for the S I MD processor can also be stored in external memory.
第 7図に示したような処理には、前記繰り返しループの制御を適用す ることができる。 すなわち、 デ一夕転送(1001)からシン ドローム格納 (1008)までの処理を複数回繰り返し、 その後で、 シン ドロームの読み込 み処理(1009)に分岐する。  The control of the repetition loop can be applied to the processing as shown in FIG. That is, the process from the overnight transfer (1001) to the storage of the syndrome (1008) is repeated a plurality of times, and thereafter, the process branches to the reading process of the syndrome (1009).
ここで、本発明の第 4の実施例で説明した繰り返しループの制御方法 を更に詳細に説明する。第 5図に示した誤り訂正の処理フローにおける 個々の処理は、 一般の信号処理がそうであるように、繰り返しループを 多く含んでいる。  Here, the control method of the repetition loop described in the fourth embodiment of the present invention will be described in more detail. Each processing in the error correction processing flow shown in FIG. 5 includes many repetitive loops as in general signal processing.
繰り返しループは、制御するハードウエアの観点から 2種類に分類さ れる。一つは制御部のデータ演算部に備えた汎用レジス夕をカウン夕に 見立てて、 ソフ トウェアで構成する繰り返しループである。 もう 1種類 は、 制御部に閧始ァドレスを格納するレジス夕、 終了ァドレスを格納す るレジス夕、 繰り返し回数をカウン卜するカウン夕を設けて、 ハードウ エアで制御する繰り返しループである。このハードウエアが前記リピー ト制御部 205である。 ソフ トウエアで構成する繰り返しループは、 ル一 ブのネス トの深さに対する制約がなく、 また、 特にリピート制御部 205 を備えないので回路規模を節約できるが、制御のための命令を発行する 必要があるので、 処理サイクル数が増える。 リピート制御部 205による 繰り返しループ制御は、繰り返し回数の制御などを全てハードウェアで 行うので、 制御のための処理サイクル数を必要としないが、 ループのネ ス トが制限されるなどの制約がある。一般に多重のネス 卜があるとき、 最も内側のループをリピートループで構成する。 Repeat loops are classified into two types from the viewpoint of controlling hardware. One is an iterative loop that is configured with software, using the general-purpose register provided in the data processing unit of the control unit as a counter. The other type is a register that stores the start address in the control unit, and stores the end address. This is a repetition loop in which a control evening is set up, and a countdown is provided to count the number of repetitions. This hardware is the repeat control unit 205. An iterative loop composed of software has no restriction on the nest depth of the loop, and in particular does not have a repeat control unit 205, so that the circuit scale can be saved, but it is necessary to issue control instructions. The number of processing cycles increases. The repetition loop control by the repeat control unit 205 does not require the number of processing cycles for control because the control of the number of repetitions and the like are all performed by hardware, but there are restrictions such as limitation of the loop nest. . Generally, when there are multiple nests, the innermost loop is composed of a repeat loop.
繰り返しループは、実現する処理フローの観点から 3種類に分類され る。繰り返し回数が固定値のもの、 繰り返し回数がそれまでの処理で既 に分かっているもの、繰り返し中にある条件を満たして繰り返しを一時 中断し別の処理を行った後に再開するもの、 である。先に説明した繰り 返しループの制御方法は、データパスが単一の場合には 3種類の処理フ ローのどれに対しても適応可能である。 しかし、 一般の S I M D型並列 プロセッサには、 必ずしも容易に適用できるわけではない。第 1の繰り 返し回数が固定値の繰り返しループは、 従来の S I M D型並列プロセ ヅサでも問題なく実現できる。第 2の繰り返し回数がそれまでの処理で 既に分かっている繰り返しループは、各々のデータ処理部で要求される 繰り返し回数が異なる場合がある。 このような繰り返しループは、従来 の S I M D型並列プロセッサでは実現できないが、本発明では以下に示 す手法で実現している。本発明では繰り返しの途中で求められる繰り返 し回数を終了したことを、 各々のデ一夕処理部 101 , 102, . . . 10nが独自 に検出して、 待機状態に入る。 デ一夕処理部 101 , 102 , . . . 10ηは必要な 繰り返し回数が少ないものから順次待機状態になっていくので、制御部 200 は全てのデータ処理部 101,102,...10nが待機状態になったことを 検出して繰り返しループを終了すればよい。第 3の繰り返し中にある条 件を満たして繰り返しを一時中断し別の処理を行った後に再開する繰 り返しループは、 デ一夕処理部 101,102,...10n毎に条件を満たすタイ ミングが異なる場合があり、従来の S I MD型並列プロセッサでは実現 困難である。本実施例で開示している構成では、 条件を満たしたデ一夕 処理部 101,102,...10nは独自に待機状態になる。 従来のリピート命令 では、 リピートループ内に条件を判定する命令を入れる必要があり、 余 分なサイクルを費やしていた。上で説明した通り、 リピートループは最 も内側のループとして使われることが多いので、余分なサイクルの追加 は全体の処理性能に致命的な影響を与える場合がある。制御部 200は少 なくとも 1個のデータ処理部が待機状態になったことを検出して、繰り 返しループを一時中断し、別の処理を行った後に繰り返しループを再開 する。別の処理を行っている間、条件を満たしたデ一夕処理部を待機状 態から復帰させ、条件を満たしていないデータ処理部は制御部 200から の信号で待機状態にする。 あるいは、 制御部 200が内部のデータ演算部 を使って独自に処理を行ってもよい。 The repetition loop is classified into three types from the viewpoint of the processing flow to be realized. The number of repetitions is a fixed value, the number of repetitions is already known in the processing up to that point, and the one that satisfies a certain condition during repetition, suspends repetition, performs another processing, and then resumes. The control method of the iterative loop described above can be applied to any of the three types of processing flows when the data path is single. However, it is not always easily applicable to general SIMD-type parallel processors. The first iteration loop with a fixed number of iterations can be realized without any problem by a conventional SIMD-type parallel processor. In a repetition loop in which the second number of repetitions is already known in the processing up to that time, the number of repetitions required in each data processing unit may be different. Such a repetition loop cannot be realized by a conventional SIMD parallel processor, but is realized by the following method in the present invention. In the present invention, each of the data processing units 101, 102,... 10n independently detects that the number of repetitions determined in the middle of the repetition has ended, and enters a standby state. Since the data processing units 101, 102,... 10η sequentially enter the standby state in ascending order of the required number of repetitions, the control unit 200 may detect that all of the data processing units 101, 102,... 10n have entered the standby state and terminate the loop repeatedly. The repetition loop that satisfies a condition during the third repetition, temporarily suspends repetition, performs another process, and then resumes processing, is a timing that satisfies the condition for each of the data processing units 101, 102, ... 10n May be different, which is difficult to achieve with conventional SIMD type parallel processors. In the configuration disclosed in the present embodiment, the data processing units 101, 102,..., 10n satisfying the conditions individually enter a standby state. In the conventional repeat instruction, it was necessary to insert an instruction to determine the condition in the repeat loop, and an extra cycle was spent. As explained above, a repeat loop is often used as the innermost loop, so adding extra cycles can have a fatal effect on overall performance. The control unit 200 detects that at least one of the data processing units is in the standby state, temporarily suspends the loop repeatedly, performs another process, and then resumes the loop. While another process is being performed, the data processing unit that satisfies the condition is returned from the standby state, and the data processing unit that does not satisfy the condition is set in the standby state by a signal from the control unit 200. Alternatively, the control unit 200 may perform processing independently using an internal data calculation unit.
第 1図乃至第 3図に示した実施例の回路構成において上記繰り返し ループの制御を行う場合、制御部 200はリピート制御手段を備えていな いので、 データ処理部 101,102,...10nの待機/活性状態は、 個々のデ —夕処理部 101,102, ...10ηに備えられた待機レジス夕 121, 122, ...12η の内容を制御部 200に転送して行う。 待機レジス夕 121,122, ...12ηの 内容は制御部 200のデ一夕演算部 203に転送されて、全てのデータ処理 部 101, 102,...10ηが待機状態にある、 又は少なく とも 1個のデータ処 理部が待機状態にある、 などの状態を基にして、 前述の繰り返しループ の制御を行う。 待機レジス夕 121,122,...12ηの内容の転送、 内容の評 価などは命令によって実行されるので、 制御部 200 がデータ処理部 101,102,...10n の待機/活性状態を頻繁に監視するほど、 監視のため の処理ステツプ数の増加が全体の処理ステツプ数に与える影響が顕著 になる。第 4図で説明したリピート制御部 205はこれを解決するもので ある。 When controlling the above-described repetition loop in the circuit configuration of the embodiment shown in FIGS. 1 to 3, the control unit 200 does not include the repeat control means, so the data processing units 101, 102,. The standby / active state is performed by transferring the contents of the standby registers 121, 122,... 12η provided in the individual data processing units 101, 102,. The contents of the standby registers 121, 122, ... 12η are transferred to the data processing unit 203 of the control unit 200, and all the data processing units 101, 102, ... 10η are in the standby state or at least 1 The above-described repetition loop is controlled based on a state such as that the data processing units are in a standby state. Transfer of contents of 121,122, ... 12η, evaluation of contents As the control unit 200 monitors the standby / active state of the data processing units 101, 102, ... 10n more frequently, the number of processing steps for monitoring increases as the overall processing increases. The effect on the number of steps becomes significant. The repeat control unit 205 described with reference to FIG. 4 solves this.
第 4図で説明した S I MD型並列プロセッサにおいて、データ処理部 101,102,.·.10ηの待機/活性状態を制御部 200 にフィードバックする 手段を更に説明する。 個々のデ一夕処理部 101, 102,...10ηに備えられ た待機レジス夕 121,122,...12ηの内容は、 専用の信号線 PEN0PEAで制 御部 200に伝達される。 データ処理部 101,102,...10ηが η個あれば信 号線 PEN0PEA は ηビッ トである。 制御部 200 は、 データ処理部 101,102, ...10η の待機/活性状態が予め設定した条件を満足したこと を検出する条件判定部 206を備え、 それによる判定結果は制御部 200内 部のフラグ 204にフィードバックされるか、 リピート制御部 205にリピ —トループを一時中断、 もしくは強制終了する制御信号を与える。 フラ グ 204にフィ一ドバックするのは、繰り返しループがソフ トウエアで制 御されている場合に有効で、 フラグ 204の状態を見てループを継続、一 時中断、 終了するなどの制御を行うことができる。 リピート制御回路 205は、 もともとループを制御するための命令実行サイクルを節約する ために設けられたハードウェアの制御回路であるから、 データ処理部 101,102,...10η の待機/活性状態を監視するために命令実行サイクル を多く費やすような利用形態は少ないと考えられる。予め設定した条件 を満たした時点でリピート制御部 205に制御信号を送って、 リピ一トル —プを強制終了する利用形態の方が多いであろう。  The means for feeding back the standby / active state of the data processing units 101, 102,... 10η to the control unit 200 in the SMD type parallel processor described with reference to FIG. The contents of the standby registers 121, 122,... 12η provided in the individual data processing units 101, 102,... 10η are transmitted to the control unit 200 via a dedicated signal line PEN0PEA. If there are η data processing units 101, 102, ... 10η, the signal line PEN0PEA has η bits. The control unit 200 includes a condition determination unit 206 that detects that the standby / active state of the data processing units 101, 102, ..., 10η satisfies a preset condition, and a determination result by the flag is stored in a flag inside the control unit 200. The control signal is fed back to 204 or a control signal for temporarily stopping the repeat loop or forcibly terminating the repeat loop to the repeat control unit 205. Feedback to flag 204 is effective when the repetitive loop is controlled by software, and controls the loop such as continuing, suspending, or terminating by checking the status of flag 204. Can be. Since the repeat control circuit 205 is a hardware control circuit originally provided to save an instruction execution cycle for controlling the loop, the repeat control circuit 205 monitors the standby / active state of the data processing units 101, 102,... 10η. Therefore, it is thought that there are few uses that consume a lot of instruction execution cycles. In many cases, a control signal is sent to the repeat control unit 205 when a preset condition is satisfied, and the repeat loop is forcibly terminated.
条件判定部 206の一実施例を第 8図に示す。各々のデータ処理部から 待機/活性状態を表す信号 ΡΕΝ0ΡΕΑ[3:0]が入力される。 前記信号 PENOPEA[3:0]は、 1個のデータ処理部が 1 ビッ ト出力して合計 4ビッ ト の信号になっている。該当するデータ処理部が待機状態のとき、 この信 号は" 1 "である。信号 PENOPMASKでマスクされていない部分(PENOPMASK= "0" ) を対象として、 対象とする全てのデータ処理部が待機状態のと き、 信号 PEN0PAND=1 とされ、 対象とするデータ処理部のうち少なくと も 1個が待機状態のとき信号 PEN0P0R=1 とされ、 どちらか選択された方 の値を条件フラグに反映する。 PEN0PAND二 1を用いた制御は、 繰り返し 回数がデータ処理部 101,102,...10η毎に異なる場合に有効である。 制 御部 200 で無限ループを作っ てお き、 各々のデータ処理部 101,102,...10n がデータ依存性にしたがって必要な回数のループを終 えて順次待機状態になっていき、 全てのデータ処理部 101,102,...10n が待機状態になった時点で繰り返しループを終了すればよい。 FIG. 8 shows an embodiment of the condition determining unit 206. A signal {0} [3: 0] representing the standby / active state is input from each data processing unit. The signal In PENOPEA [3: 0], one data processing unit outputs 1 bit, for a total of 4 bits. This signal is "1" when the corresponding data processing unit is in the standby state. When all the target data processing units are in the standby state for the part that is not masked by the signal PENOPMASK (PENOPMASK = "0"), the signal PEN0PAND = 1 is set, and the number of the target data processing units is small. When one is in the standby state, the signal PEN0P0R = 1 is set, and the value of either one is reflected in the condition flag. Control using PEN0PAND 21 is effective when the number of repetitions differs for each of the data processing units 101, 102,. An infinite loop is created in the control unit 200, and each data processing unit 101, 102, ... 10n finishes the necessary number of loops according to the data dependency and goes into the standby state sequentially, and all data processing The loop may be ended repeatedly when the units 101, 102,... 10n enter the standby state.
上記説明では、待機レジス夕が 1ビッ トの場合についてのみ開示した が、 これを複数ビッ ト備えてもよい。 例えば、 2ビッ トの待機レジス夕 を設けて誤り訂正を行う場合、誤りの検出されなかった受信符号語の訂 正を行っているデータ処理部は上位ビッ トを 1にして待機状態になつ ており、その他のデータ処理部で誤りが検出されてその訂正処理を行う 過程で、繰り返しループを早めに終了させるために待機状態を下位の 1 ビッ トを用いれば、 より効率的な制御が可能となる。  In the above description, only the case where the standby register is 1 bit is disclosed, but a plurality of bits may be provided. For example, if a 2-bit standby register is provided for error correction, the data processing unit that corrects the received codeword for which no error was detected sets the upper bit to 1 and enters the standby state. In the process of detecting errors in other data processing units and correcting them, more efficient control is possible by using the lower 1 bit in the standby state to terminate the repetition loop earlier. Become.
第 1 0図にはデータ処理部 101,102,...10nの一例が示される。 デ一 夕処理部 101,102,.·.10ηにおいて、 整数レジス夕 70、 バッファ 50、 レ ジス夕 60がバス L D Βに接続されている。メモリ 801、 802、制御部 200、 周辺回路 900とのデ一夕転送はこのバス L D Βを通して行わる。バッフ ァ 50は、 前記記憶手段 141,142,...14ηの一例で、 数十ワードのデータ を格納しており、 そのうちの数ワードがポインタ 20で並列的に指定さ れて、 演算ゃデ一夕転送に用いられる。 ポイ ンタ 20 はポイ ンタ 151, 152 , . . . 15nの一例である。 独立したレジス夕 60も演算ゃデ一夕転 送に用いられる。 バッファ 50、 レジス夕 60と演算回路 10 (前記演算部 161 , 162 , . . . 16η に含まれる) は、 内部バスを介して接続されており、 バヅファ 50とレジス夕 60の間のデータ転送もこの内部バスを介して実 行される。 演算回路 10は複数の演算器を備えていて、 演算結果をフラ グレジス夕 40に反映する。 ポインタ 20の値は、 バッファ 50のァドレ スを与えており、 演算命令、 データ転送命令による演算や転送処理と並 列に値を更新することができる。誤り訂正において、 多項式の係数をバ ッファ 50に格納しておき、 多項式どうしの演算を行う場合、 ポインタ 20 を順次更新しながら行うと効率的である。 ディジタルフィル夕で係 数とデータとをバッファ 50に格納しておき、ボイン夕 20を順次更新し ながらフィル夕の出力を求める場合にも、 効率的に作用する。 もちろん ボイン夕 20の値は更新しないで一定値を保持してもよい。 ポィン夕値 を更新しながら処理を行っていて、ある所定の値に達したときにその処 理を終了したい場合に、 ボイン夕の値を監視する必要がある。第 1 0図 においては、ボイン夕 20の値を所定の値と比較する比較器 30を設けて、 比較結果をフラグ 40 に反映することによってこれが実現される。 バッ ファ 50のァドレスを計算する上で元になる値を整数レジス夕 70に格納 しておく。整数レジス夕 70は別の内部バスを介して整数演算器 80に接 続されており、 アドレス演算を行う。 整数レジス夕 70は、 バッファ 50 のァドレス演算だけを行う必要はなく、全く別の整数演算を行ってもよ い。誤り訂正のプログラムでは値がゼロのシンドロームの数を数え、 全 てのシンドロームの値がゼロであること、すなわち処理中の受信符号語 に誤りがないことを検出するために用いられる。 整数演算器 80の演算 結果もフラグレジス夕 40に反映される。この整数演算器 80も第 4図の 演算部 161, 162 , . . . 16ηに含まれている。フラグレジス夕 40の内容は、 制御部 200から供給される待機条件と比較され、条件が満たされた場合 に、 待機レジス夕 42をセッ トして待機状態になる。前記レジスタ 42は、 前記待機レジス夕 121,122 , . . . 12nに相当する。 待機状態になると制御 信号を無効にする回路 43は、 入力された制御信号を非活性状態に制御 して演算などの動作が行われないようにする。 待機レジス夕 42は制御 部 200からの待機状態変更信号で変更できる。 回路 43は第 4図の回路 131 , 132,. ·, 13ηに相当する。第 1 1図において、 フラグ 40と比較器 41 が、 第 4図の判定部 171 , 172, . . . 17ηに含まれる。 FIG. 10 shows an example of the data processing units 101, 102,... 10n. In the data processing units 101, 102,... 10η, the integer register 70, buffer 50, and register 60 are connected to the bus LD #. Data transfer between the memories 801 and 802, the control unit 200, and the peripheral circuit 900 is performed through this bus LD #. The buffer 50 is an example of the storage means 141, 142,... 14η, and stores data of several tens of words, of which several words are designated in parallel by the pointer 20, and the operation data is stored. Used for overnight transfer. Pointer 20 is pointer 151, 152,... 15n are examples. An independent register 60 is also used for the transfer of the calculation data. The buffer 50, the register 60, and the arithmetic circuit 10 (included in the arithmetic units 161, 162,... 16η) are connected via an internal bus, and data transfer between the buffer 50 and the register 60 is also performed. It is executed via this internal bus. The arithmetic circuit 10 includes a plurality of arithmetic units, and reflects the arithmetic result on the flag register 40. The value of the pointer 20 gives the address of the buffer 50, and the value can be updated in parallel with the operation and transfer processing by the operation instruction and the data transfer instruction. In the error correction, when the coefficients of the polynomial are stored in the buffer 50 and the operations of the polynomials are performed, it is efficient to perform the operations while sequentially updating the pointer 20. It is also effective when the coefficient and data are stored in the buffer 50 in the digital fill mode and the output of the fill mode is obtained while sequentially updating the boyne set 20. Of course, the value of Boyne 20 may be kept constant without updating. If the processing is being performed while updating the value of the point-in-time, and the processing is to be terminated when the value reaches a predetermined value, it is necessary to monitor the value of the point-in-time. In FIG. 10, this is realized by providing a comparator 30 for comparing the value of the boy 20 with a predetermined value, and reflecting the comparison result on a flag 40. The value used to calculate the address of buffer 50 is stored in integer register 70. The integer register 70 is connected to the integer arithmetic unit 80 via another internal bus and performs an address operation. The integer register 70 does not need to perform only the address operation of the buffer 50, but may perform a completely different integer operation. The error correction program counts the number of syndromes with a value of zero, and is used to detect that all the syndrome values are zero, that is, there is no error in the received codeword being processed. The operation result of the integer operation unit 80 is also reflected in the flag register 40. The integer operation unit 80 is also included in the operation units 161, 162,... 16η of FIG. Flag Regis Evening 40 It is compared with the standby condition supplied from the control unit 200, and when the condition is satisfied, the standby register 42 is set to enter the standby state. The register 42 corresponds to the standby registers 121, 122,... 12n. The circuit 43 for disabling the control signal when in the standby state controls the input control signal to an inactive state so that operations such as arithmetic are not performed. The standby register 42 can be changed by a standby state change signal from the control unit 200. The circuit 43 corresponds to the circuits 131, 132,..., 13η in FIG. In FIG. 11, a flag 40 and a comparator 41 are included in the determination units 171, 172,... 17η in FIG.
待機レジス夕 42の値がセッ トされているとき、 データ処理部に入力 される制御信号は、 回路 43により、 待機レジス夕 42へのアクセスを制 御するもの以外はすべて無効となるように制御されている。消費電力低 減の観点から、待機状態にある場合にはクロックを停止する制御を行う ことが望ましい。 また、 全ての制御信号にその信号を無効にするゲート を挿入すると、制御信号の数が数百本にもなることがまれではないので、 挿入するゲートによる回路規模の増加も無視できない。クロックを停止 する場合にはクロックに対してだけゲートを挿入すればよいから、回路 規模の観点からも好適である。第 1 1図にはクロック信号を停止させる 構成が示されている。待機状態において、 待機レジス夕 4 2にはリセッ ト動作が可能なように、 クロック信号が供給され、 その他の回路へのク ロック信号の供給は回路 43によって抑止される。  When the value of the standby register 42 is set, the control signal input to the data processing unit is controlled by the circuit 43 so that all control signals except those that control access to the standby register 42 are invalidated. Have been. From the viewpoint of power consumption reduction, it is desirable to perform control to stop the clock when in the standby state. In addition, if a gate that invalidates the signal is inserted into all control signals, the number of control signals is not rarely several hundred, so the increase in circuit scale due to the inserted gate cannot be ignored. When the clock is stopped, it is only necessary to insert a gate for the clock, which is preferable from the viewpoint of the circuit scale. FIG. 11 shows a configuration for stopping the clock signal. In the standby state, a clock signal is supplied to the standby register 42 so that a reset operation can be performed, and the supply of the clock signal to other circuits is suppressed by the circuit 43.
前記演算回路 10は固定小数点演算器を備えれば画像処理などに好適 であり、浮動小数点演算器を備えればコンピュータグラフィ ックスなど に好適である。  The arithmetic circuit 10 is suitable for image processing and the like if it has a fixed-point arithmetic unit, and is suitable for computer graphics and the like if it has a floating-point arithmetic unit.
次に示す実施例では、 演算回路 10にガロア体演算器を設け、 リード ソロモン符号などの誤り訂正処理に好適なデ一夕処理部の一例につい て説明する。 第 1 2図には誤り訂正処理に好適なデ一夕処理部の詳細な一例が示 される。 同図に示されるデ一夕処理部は、 2個のガロア体乗算器 11、 1個のガロア体加算器 12、 6 4ワードのガロアバッファ 50、 4ワード のガロアレジス夕 60、 3個のポインタ (PS1,PS2,PD) 21,22,23, 8ヮ —ドの整数レジス夕 70、 1個の整数加減算器 80、 その他、 演算結果を 判定してデ一夕処理部を待機状態にする回路から成る。 In the following embodiment, a Galois field arithmetic unit is provided in the arithmetic circuit 10, and an example of a data processing unit suitable for error correction processing such as Reed-Solomon code will be described. FIG. 12 shows a detailed example of a data processing unit suitable for error correction processing. The data processing unit shown in the figure is composed of two Galois field multipliers 11, one Galois field adder 12, 6 4 words Galois buffer 50, 4 words Galois register 60, and 3 pointers ( PS1, PS2, PD) 21,22,23,8 ヮ ド 整数 夕 か ら 70 70 70 70 70 、 、 、 、 70 、 70 、 70 70 、 70 70 、 70 70 Become.
ガロアバッファ 50、 ガロアレジス夕 60と、 2個のガロア体乗算器 11、 1個のガロア体加算器 12は、 6本の内部バスで接続されており、 ガロ アバッファ 50、 ガロアレジス夕 60に格納されたデ一夕の演算を行って、 その結果を、 ガロアバッファ 50、 ガロアレジス夕 60に再格納する。 演 算器 11, 12の入力にはセレクタ 14が設けられており、 演算に使われる デ一夕が出力された内部バスを選択できる。演算器 11,12の出力は内部 バスを介して、 ガロアバッファ 50またはガロアレジス夕 60に格納され る。  Galois buffer 50, Galois register 60, two Galois field multipliers 11, and one Galois field adder 12 are connected by six internal buses, and are stored in Galois buffer 50 and Galois register 60. The calculation is performed overnight, and the result is stored in the Galois buffer 50 and the Galois register 60 again. The selectors 14 are provided at the inputs of the calculators 11 and 12, and the internal bus from which the data used for the calculation is output can be selected. The outputs of the arithmetic units 11 and 12 are stored in the Galois buffer 50 or Galois register 60 via the internal bus.
ガロアバッファ 50は、 ポインタ(PS1)21, (PS2)22で指定されたアド レスの 2ヮ一ドを同時に内部バスに出力し、 同時にボイン夕(PD)23 で 指定されたア ドレスに内部バスからデータを取り込む。 ポイ ンタ 21,22,23の値はガロア体演算と同一サイクル内に、 増減 (+ 1/— 1 ) が可能である。 ポインタの値は、 整数レジス夕 70から書き込みと読み 出しが行われる。整数レジス夕 70には、 ポインタ 21,22,23の制御に必 要な値を演算するために必要なデ一夕が格納されており、接続された整 数加減算器 80を用いて演算が行われる。  The Galois buffer 50 simultaneously outputs the address of the address specified by the pointers (PS1) 21 and (PS2) 22 to the internal bus, and simultaneously outputs the address specified by the pointer (PD) 23 to the internal bus. Import data from The values of pointers 21, 22, and 23 can be increased or decreased (+ 1 / -1) in the same cycle as Galois field arithmetic. The pointer value is written and read from the integer register 70. The integer register 70 stores data required to calculate the values required for controlling the pointers 21, 22, and 23, and the calculation is performed using the connected integer adder / subtractor 80. Will be
ポインタ 21,22,23の値を増減しながらガロア体演算を繰り返すよう な処理では、 ボイン夕 21,22,23の値が所定の値になったときに繰り返 しを終了したい場合がある。 所定の値をレジス夕(PEND)31 に予め格納 しておき、 制御信号 SELPENDで監視すべきポインタを指定して、 監視し ているボイン夕の値がレジス夕(PEND )31 に予め設定した値と一致する か否かが比較器(CMP )33によって判定される。 その判定結果が一致した とき、 信号 RPTENDによってフラグレジス夕 40の RPTEND フラグを立て、 更に、 待機レジス夕 42をセッ トする。 ボイン夕 21 , 22, 23の監視を行わ ない場合には、 制御信号 SELPENDを使ってポインタ 21,22,23の値とレ ジス夕(PEND )31の値との比較を行わないように設定する。 In a process that repeats the Galois field operation while increasing or decreasing the values of the pointers 21, 22, and 23, it may be desired to end the iteration when the values of the boys 21, 22, and 23 reach a predetermined value. A predetermined value is stored in advance in the register (PEND) 31, and a pointer to be monitored is designated by the control signal SELPEND to perform monitoring. It is determined by the comparator (CMP) 33 whether or not the value of the present evening coincides with the value preset in the resist evening (PEND) 31. When the judgment results match, the RPTEND flag of the flag register 40 is set by the signal RPTEND, and the standby register 42 is set. When not monitoring the boys 21, 22, and 23, use the control signal SELPEND to set not to compare the values of pointers 21, 22, and 23 with the value of the register (PEND) 31. .
データ処理部に個別に設けられたフラグレジス夕 40 には、 上記の RPTEND フラグ以外に、ガロア体加算器 12の結果がゼロになったことを 示す GZフラグ、 整数演算器 80の結果がゼロになったことを示す IZER0 フラグと負になったことを示す INEGフラグが設けられている。  In addition to the above RPTEND flag, the flag register 40 separately provided in the data processing unit has a GZ flag indicating that the result of the Galois field adder 12 has become zero, and the result of the integer arithmetic unit 80 has become zero. There is an IZER0 flag that indicates that the condition has been reached and an INEG flag that indicates that the condition has become negative.
フラグレジス夕 40の内容は、 マスク付き比較器 41で信号 CNDXMASK をマスクとして用い信号 N0PCNDXと比較され、一致した場合には待機レ ジス夕 42をセッ トする。 待機レジス夕 42は、 例えば、 1 ビヅ トのレジ ス夕で、上記の 2つの方法でデータ処理部内の演算結果でセッ トできる ほか、 外部から信号 PEN0PINで直接値を書き込んでリセッ トしたり、 ま た信号 PEN0PEAとして直接外部に読み出すことができる。待機レジス夕 42 の値がセッ 卜されているとき、 データ処理部に入力される制御信号 は、 回路 43より、 待機レジス夕 42へのアクセスを制御するもの以外は すべて無効となるように制御されている。消費電力低減の観点から、 待 機状態にある場合にはクロックを停止する制御を行うことが望ましい。 また、すべての制御信号にその信号を無効にするゲートを挿入すると、 制御信号の数が数百本にもなることがまれではないので、挿入するゲ一 トによる回路規模の増加も無視できない。クロックを停止する場合には クロックに対してだけゲートを挿入すればよいから、回路規模の観点か らも好適である。  The content of the flag register 40 is compared with the signal N0PCNDX by the comparator 41 with a mask using the signal CNDXMASK as a mask, and if they match, the standby register 42 is set. The standby register 42 is, for example, a 1-bit register, and can be set by the operation result in the data processing unit by the above two methods, or can be reset by writing a value directly from the outside with the signal PEN0PIN. Also, it can be read directly to the outside as signal PEN0PEA. When the value of the standby register 42 is set, the control signal input to the data processing unit is controlled by the circuit 43 so that all signals except those for controlling access to the standby register 42 are invalidated. ing. From the standpoint of reducing power consumption, it is desirable to perform control to stop the clock when in the standby state. Further, if a gate that invalidates the control signal is inserted into all the control signals, the number of control signals is not rarely several hundreds, so the increase in the circuit scale due to the inserted gate cannot be ignored. When the clock is stopped, the gate needs to be inserted only for the clock, which is preferable from the viewpoint of the circuit scale.
第 9図には前記マスク付き比較器 41の一例が示される。 4ビッ 卜の フラグと、予め設定された 4ビッ 卜の条件信号 N0PCNDXをビッ ト毎に比 較するが、 信号 CNDXMASKで ' 1, が指定されているビッ トは比較の対 象外とされる。 信号 NOPCNDX,CNDXMASKは制御部 200から、 各デ一夕処 理部に共通に与えられる信号とされる。 FIG. 9 shows an example of the comparator 41 with the mask. 4-bit The flag is compared with the preset 4-bit condition signal N0PCNDX for each bit, but the bits for which '1' is specified in the signal CNDXMASK are excluded from the comparison. The signals NOPCNDX and CNDXMASK are signals that are commonly supplied from the control unit 200 to each data processing unit.
第 1 3図、 第 1 4図及び第 1 5図には、 第 1 2図に示されるデ一夕処 理部を有する第 4図の S I M D型並列プロセッサが実行する命令の一 例が示される。制御部 200の動作を記述するには、 命令語長を短く し、 若し く はコー ド効率を上げる とい う 点で、 R I S C (Reduced Instruction Set Computer )命令が好適である。 本実施例の S I M D型 並列プロセッサには Rェ S Cアーキテクチャが採用されている。  FIGS. 13, 14 and 15 show examples of instructions executed by the SIMD type parallel processor of FIG. 4 having the data processing unit shown in FIG. 12. . In order to describe the operation of the control unit 200, a reduced instruction set computer (RISC) instruction is preferable in terms of shortening the instruction word length or increasing the code efficiency. The R ID SC architecture is adopted for the SIMD type parallel processor of this embodiment.
第 1 3図に示した命令は本発明を実施する上で特に好適な命令の例 で、 一般の RISC命令に対して新規に追加された命令である。 一般的な RISC命令の他に RISCのデータ転送命令と S I M D命令を並列に記述で きる命令を備えている。第 1 4図に示したデ一夕転送命令と、 第 1 5図 に示した SIMD命令は制約なしに組み合わせて並列に記述され、 並列に 実行できる。  The instruction shown in FIG. 13 is an example of an instruction which is particularly suitable for implementing the present invention, and is an instruction newly added to a general RISC instruction. In addition to general RISC instructions, it has instructions that can describe RISC data transfer instructions and SIMD instructions in parallel. The data transfer instruction shown in Fig. 14 and the SIMD instruction shown in Fig. 15 can be combined and described in parallel without any restrictions, and can be executed in parallel.
RI SC 命令には、 第 1 3図に示す様に、 デ一夕設定命令とリピート命 令が新規に追加されている。 データ設定命令は、 フラグ (フラグレジス 夕 40内のフラグ) の状態を監視する条件データの設定と、 データ処理 部の待機/活性状態を変更する状態デ一夕の設定を行う。 「setPEN0P=l if 条件」 は、 N0PCNDX, CNDXMASKの出力元である制御部 200 内部のレ ジス夕に条件を設定し、 この時、 データ処理部のフラグの状態が条件を 満た していれば該当するデ一夕処理部を待機状態にさせる。 「setPEN0P=l when 条件」 は、 信号 NOPCNDX, CNDXMASKの出力元である 制御部 200内部のレジス夕に条件を予め設定し、各々のデータ処理部が 以降のサイクルで条件を満たしたとき、該当するデータ処理部を待機状 態にさせる。 「setPENOP=l if 条件」 とは異なり、 未来事象に対する監 視条件設定で、この命令以降繰り返しループが開始されるような場合、 ループ内でフラグを監視するサイクルをとらなくてよいので、性能向上 に寄与する。 As shown in Fig. 13, the RISC instruction has a new setting instruction and a repeat instruction. The data setting instruction sets the condition data for monitoring the status of the flag (the flag in the flag register 40) and the status data for changing the standby / active status of the data processing unit. The “setPEN0P = l if condition” sets the condition in the register inside the control unit 200, which is the output source of N0PCNDX and CNDXMASK. At this time, if the status of the flag in the data processing unit satisfies the condition, it is applicable. The processing section is set to a standby state. The “setPEN0P = l when condition” is set when the conditions are set in advance in the register inside the control unit 200 that is the output source of the signals NOPCNDX and CNDXMASK, and when each data processing unit satisfies the condition in the subsequent cycle. Standby state for data processing unit Let it go. Unlike the "setPENOP = l if condition", when the monitoring condition is set for a future event and a loop is repeatedly started after this instruction, it is not necessary to take a cycle to monitor the flag in the loop, thus improving performance. To contribute.
「setPENOPEA=#I匪.」は、 データ処理部を個々に待機/活性状態にす る。 #1腿.はデ一夕処理部 1個に対応する 1 ビッ トを持ち、 1で待機、 “SetPENOPEA = # I bandage.” Puts the data processing units individually in the standby / active state. # 1 thigh has one bit corresponding to one data processing unit, waits at 1,
◦で活性状態を意味する。 ある処理が終了して、 全てのデータ処理部が 待機状態になったとき、次の処理に進むためにすベて若しくは所定の一 部のデータ処理部を活性状態に戻す場合に用いる。 ◦ means an active state. When a certain process is completed and all the data processing units are in the standby state, all the data processing units are used to proceed to the next process or when a predetermined part of the data processing units is returned to the active state.
ここで、前記データ設定命令を用いてデータ処理部を待機状態にする 構成及び命令について詳述する。第 1 9図の S I M D型並列プロセッサ に例示されるように、 制御部 200は、 信号 NOPCNDX, CNDX ASKが設定さ れる 2個のレジス夕(N0PCNDX)207, ( CNDXMASK )208を有する。 これらの レジス夕 207, 208は命令によって値を書き込むことができる。レジス夕 207, 208の出力はセレクタ 14を介して各々のデ一夕処理部 101, 102, . . . 10ηの結果判定回路 171, 172, . . . 17ηに供給可能にされている。 結果 判定回路 171 , 172, . . . 17ηでは信号 CNDXMASKによってマスクされて いないビッ トに関してのみ、演算結果のフラグと信号 NOPCNDXを比較し、 一致した場合に該当するデータ処理部を待機状態にする。前記条件とマ スクを与えるには、 前記" setPEN0P=l if"と" setPEN0P=l when"の 2つの 命令を用いる。  Here, a configuration and an instruction for putting the data processing unit in a standby state by using the data setting instruction will be described in detail. As exemplified in the SIMD parallel processor of FIG. 19, the control unit 200 has two registers (N0PCNDX) 207 and (CNDXMASK) 208 in which signals NOPCNDX and CNDX ASK are set. These registers 207 and 208 can be written with instructions. The outputs of the register registers 207 and 208 can be supplied to the result decision circuits 171 172,... 17η of the respective data processing sections 101, 102,. In the result judgment circuits 171, 172,... 17η, the operation result flag is compared with the signal NOPCNDX only for bits not masked by the signal CNDXMASK, and if they match, the corresponding data processing unit is put into a standby state. To give the condition and mask, two instructions, "setPEN0P = l if" and "setPEN0P = l when" are used.
第 2 0図には" setPEN0P=l if"を実行した場合の動作タイ ミングチヤ —卜の一例が示され、 第 2 1図には" setPEN0P=l when"を実行した場合 の動作タイミングチヤートの一例が示される。双方のタイミングチヤ一 トでは、 命令フェッチ、 命令デコード、 実行の 3段パイプラインの場合 を例示したが、 パイ プライ ンの方式はここでは問題にならない。 "setPENOP=l if"は、 命令をデコードした信号をセレクタ 14によって選 択して、 その時点でのフラグの状態を判定し、 与えた条件が満足された 場合に当該デ一夕処理部を待機状態にする方法である。 4サイクル目で 発行された" setPENOP二 1 if"命令が実行される 6サイクル目の時点でフ ラグの値が確定しており、その値が与えられた条件と一致するかどうか を判定し、一致したときは PEN0P二 0を書き込んでクロックを停止する。 この命令なしにフラグの値が判定されることはない。 FIG. 20 shows an example of an operation timing chart when “setPEN0P = l if” is executed, and FIG. 21 shows an example of an operation timing chart when “setPEN0P = l when” is executed. Is shown. In both timing charts, the case of a three-stage pipeline of instruction fetch, instruction decode, and execution is illustrated, but the pipeline method does not matter here. “setPENOP = l if” selects the signal obtained by decoding the instruction by the selector 14, determines the flag state at that time, and waits for the data processing unit when the given condition is satisfied. It is a method to make a state. At the time of the sixth cycle when the "setPENOP2 1 if" instruction issued in the fourth cycle is executed, the value of the flag is determined, and it is determined whether or not the value matches the given condition. If they match, write PEN0P20 to stop the clock. The flag value is not determined without this instruction.
一方、 "setPENOP=l when"はレジスタ(N0PCNDX)207, (CNDXMASK )208の 値を保持してそれ以降の演算命令で条件を満足したら該当するデ一夕 処理部を待機状態にする命令である。第 2 1図において、 4サイクル目 に条件が設定され、 6サイクル目の演算結果が条件を満足して 7サイク ル目から待機状態に入る。演算結果は各々のデータ処理部で異なるので、 待機状態になるタイミングも同一とは限らない。  On the other hand, "setPENOP = l when" is an instruction that holds the values of the registers (N0PCNDX) 207 and (CNDXMASK) 208 and puts the corresponding data processing unit into a standby state when the conditions are satisfied by the subsequent operation instructions. . In FIG. 21, the condition is set in the fourth cycle, the operation result in the sixth cycle satisfies the condition, and the standby state is entered from the seventh cycle. Since the result of the operation differs in each data processing unit, the timing of the standby state is not always the same.
このように、 "setPENOP=l if"は、 この命令が発行された時点でのフ ラグ 40の状態を判定するもので、 " setPEN0P=l when"はこの命令が発行 された以降の演算結果に対して待機のための条件を与えることができ る。  As described above, “setPENOP = l if” determines the state of the flag 40 at the time this instruction is issued, and “setPEN0P = l when” determines the operation result after this instruction is issued. Conditions for waiting can be given to them.
第 1 3図に示したリピート命令には、 通常のリピート命令と、 強制終 了条件付きリピート命令がある。 通常のリピート命令 「 REPEAT RS, RE, RCjは、 アドレス RSからアドレス REまでの命令を、 RC回繰り返す命令 で、 リビート制御回路 205でハ一ドウエア制御を行うので、 繰り返しル ープ制御のためのオーバ一へッ ドサイクルを要しない。強制終了条件付 きリピート命令 「REPEAT RS, RE, RC, unti l 条件」 は、 上と同様にァ ドレス RSからアドレス REまでの命令を、 RC回繰り返す命令であるが、 条件を満たした場合にリピートループを強制終了する。強制終了条件と しては、マスクされていない全てのデータ処理部が待機状態になること ( PENOPAND ) s又はマスクされていない少なく とも 1個のデ一夕処理部が 待機状態になること( PEN0P0R )を設定できる。 繰り返し回数が個々のデ —夕処理部で異なる場合には強制終了条件として PEN0PANDを用い、 あ るデ一夕が発見されるまで検索を繰り返す場合には強制終了条件とし て PEN0P0Rを用いるのが有効である。 The repeat instruction shown in Fig. 13 includes a normal repeat instruction and a repeat instruction with a forced termination condition. Normal repeat instruction “REPEAT RS, RE, RCj is an instruction that repeats the instruction from address RS to address RE RC times, and hardware control is performed by the beat control circuit 205. The repeat instruction with forced termination condition "REPEAT RS, RE, RC, and until condition" does not require an overhead cycle, and the instruction from address RS to address RE is repeated RC times as above. However, if the condition is satisfied, the repeat loop is forcibly terminated. The condition for forced termination is that all unmasked data processing units enter the standby state. (PENOPAND) s or at least one unmasked data processing unit can be set to the standby state (PEN0P0R). It is effective to use PEN0PAND as the forced termination condition when the number of repetitions differs for each data processing unit, and to use PEN0P0R as the forced termination condition when the search is repeated until a certain data is found. It is.
第 1 4図にデータ転送命令の一例が示される。制御部 200のデータ演 算部 203に備えられた汎用レジス夕をァドレスポインタ、アドレスイン デックスポィン夕として用い、 3種類の口一ド命令と 3種類のストァ命 令、 及びノ一ォペレ一ション NOPが指定できる。 口一ド命令は第 2のメ モリ 802からデータ処理部 101 , 102,. . . 10ηのレジス夕 60へのデータ転 送命令であり、 ス トァ命令は逆にデータ処理部のレジス夕 60から第 2 のメモリ 802へのデ一夕転送である。メモリのァドレスは制御部 200の アドレスポインタとして用いられるレジス夕の内容が用いられる。 @A の場合はァドレスボイン夕を更新しない。 @A+の場合はデータ転送後ァ ドレスポインタの値を 1増加する。 @A+I の場合はデータ転送後ァドレ スポィン夕の値をァドレスインデックスボイン夕の分だけ増加する。口 ―ド命令のデ一夕の転送先 (デスティネーション) として、 またストア 叩令のデータの転送元(ソース) として、 データ処理部 101,102, . . . 10n の整数レジス夕 70、 ガロアレジス夕 60、 ガロアバッファ 50が選択でき る。ガロアバッファ 50が選択されたときバッファボイン夕 21〜23が、 保持若しくは + 1される。制御部 200側のメモリのアドレスポインタと デ一夕処理部側のバッフアポィン夕 21〜23の動作を自動的に同期させ ることができるので、 効率の良いデータ転送が可能となる。 ここで開示 した本発明の実施例の特徴は、 デ一夕転送命令と SIMD命令が同時並列 に実行され、更に同時にガロアバッファ 50のボイン夕 21〜23が更新で きる点である。データ転送と演算が同時に実行できてもボイン夕が同時 に更新できなければ、 高い処理性能は期待できない。 FIG. 14 shows an example of a data transfer instruction. The general-purpose register provided in the data calculation unit 203 of the control unit 200 is used as an address pointer and an address index unit.Three types of mouth instructions, three types of store instructions, and the operation NOP Can be specified. The mouth instruction is a data transfer instruction from the second memory 802 to the data processing unit 101, 102,... 10η register 60, while the store instruction is a data transfer instruction from the data processing unit register 60. This is an overnight transfer to the second memory 802. The contents of the register used as the address pointer of the control unit 200 are used as the address of the memory. In case of @A, addless boyne is not updated. In the case of @ A +, the value of the address pointer is incremented by 1 after data transfer. In case of @ A + I, after data transfer, the address value is increased by the addressless index value. The data processing unit 101, 102,... 10n integer registers 70 and Galois registers as the destination (destination) for the transfer of the instruction and the source (source) for the data of the store command. 60, Galois buffer 50 can be selected. When the Galois buffer 50 is selected, buffer buses 21 to 23 are held or +1. Since the operation of the address pointer of the memory on the control unit 200 side and the operation of the buffer units 21 to 23 on the data processing unit side can be automatically synchronized, efficient data transfer becomes possible. A feature of the embodiment of the present invention disclosed here is that the data transfer instruction and the SIMD instruction are executed in parallel and simultaneously, and the boys 21 to 23 of the Galois buffer 50 can be updated at the same time. Even if data transfer and calculation can be performed simultaneously, If it cannot be updated, high processing performance cannot be expected.
'第 1 5図には S I M D命令の一例が示される。 S I M D命令は、 レジ ス夕間のデータ転送命令、ボイン夕の値を計算するための整数演算命令、 直接誤り訂正処理を行うガロア体演算命令に大別される。 GICOPY命令 は整数データの転送命令で、 整数レジス夕 PO, P1 , . . . P7 とポインタ PS1 , PS2, PD, PENDの間のデータ転送を行う。 GC0PY命令はガロア数の 転送命令で、 ガロアレジス夕 60、 ガロアバッファ 50の間でデ一夕転送 を行う。 ガロアバッファ 50がオペランドになるときは、 ポインタ 21〜 23の同時更新が可能となっていて、 バッファ 50の初期化を行う場合な どに有効である。 整数演算命令はポインタ 21〜23の値を生成するため に整数レジス夕間の加減算、 インクリメント、 デクリメントを行う。 こ の時、 加算結果がゼロになったとき IZER0フラグを、 加算結果が負にな つたとき INEGフラグを立てる。 誤り訂正の処理を行うときに整数レジ ス夕 70は多項式の次数を格納すると、 効率良くプログラムを作成でき るが、 このとき二つの多項式の次数を比較したり、 ループの繰り返し回 数を与えるときにフラグを用いる。  'FIG. 15 shows an example of the SIMD instruction. The SIMD instruction is roughly classified into a data transfer instruction during the register operation, an integer operation instruction for calculating the value of the register error, and a Galois field operation instruction that performs direct error correction processing. The GICOPY instruction is an integer data transfer instruction that transfers data between the integer registers PO, P1,... P7 and pointers PS1, PS2, PD, and PEND. The GC0PY instruction is a transfer instruction for the number of Galois, and transfers data between the Galois register 60 and the Galois buffer 50 overnight. When the Galois buffer 50 becomes an operand, the pointers 21 to 23 can be updated at the same time, which is effective when the buffer 50 is initialized. The integer operation instruction performs addition, subtraction, increment, and decrement of the integer register in order to generate the values of the pointers 21 to 23. At this time, the IZER0 flag is set when the addition result becomes zero, and the INEG flag is set when the addition result becomes negative. If the integer register 70 stores the degree of a polynomial when performing error correction processing, a program can be created efficiently.However, when comparing the degree of two polynomials or giving the number of loop iterations, Use a flag for
ガ口ァ体演算命令は、誤り訂正に用いられるガロァ体上の数の演算を 行うものである。 単独乗算(GMULT)、 単独加算(GADD )、 積和演算(GMAC ) の他に、 乗算と積和演算を同時に行う、 命令 GADMSと GADMCを準備して いる。 命令 GADMSと GADMCが実行されると、 乗算側(Sy :=Sx*Sy)は係数 の 2乗、 3乗、 4乗を順次計算していき、 積和演算(D :=Sy*D+Sz )にその 係数が使われる。命令 GADMSはシンドローム計算に、 また命令 GADMCは チェンサーチに適している。複雑な積和命令(D :二 Sw*Sx+Sy*Sz )はュ一ク リッ ド互除法に適した命令である。 命令 GINVは、 これを 7回使うこと によって、 ガロア体上の数の逆数を求めるときに使われる。逆数をかけ ることによって除算が実行できる。 第 1 5図において、 ガロア演算命令 GMULTSx,Sy,D等の内容の欄に示 された GB(PS1[+/-])はガロアバッファ(GB)のボイン夕(PS1)に関し、 維 持、 + 1又は一 1を選択でき、 その操作も当該命令で同時に実行できる ことを意味する。 GB(PS2[+/-])もポインタ (PS2) に関し同様の意味を 持つ。 尚、 + 1, — 1の操作は、 P S 1, P S 2 , PD個々に附加され たインクリメン夕, デクリメン夕で行なわれる。何れを選択するかは命 令で指定される。 The Galois field operation instruction is used to calculate the number on the Galois field used for error correction. In addition to single multiplication (GMULT), single addition (GADD), and multiply-accumulate (GMAC), instructions GADMS and GADMC that perform multiplication and multiply-accumulate simultaneously are provided. When the instructions GADMS and GADMC are executed, the multiplication side (Sy: = Sx * Sy) sequentially calculates the square, the third and the fourth power of the coefficient, and calculates the product-sum operation (D: = Sy * D + Sz ) Uses that coefficient. Command GADMS is suitable for syndrome calculation, and command GADMC is suitable for chain search. A complex multiply-accumulate instruction (D: two Sw * Sx + Sy * Sz) is an instruction suitable for the quick algorithm. The instruction GINV is used to find the inverse of a number on a Galois field by using it seven times. Division can be performed by multiplying by the reciprocal. In FIG. 15, the GB (PS1 [+/-]) shown in the column of the contents of the Galois operation instructions GMULTSx, Sy, D, etc. is related to the Galois buffer (GB) 's Boyne (PS1). 1 or 1 can be selected, which means that the operation can be executed simultaneously by the instruction. GB (PS2 [+/-]) has the same meaning for the pointer (PS2). The operations of +1 and -1 are performed in the increment and decrement evenings respectively added to PS 1, PS 2 and PD. The choice is specified by the instruction.
第 1 6図には前記一般 R I S C命令、そして前記データ転送命令と S I MD命令を組み合わせた命令の全ての命令コ一ドの構成が示される。 一般 R I S C命令は二一モニックに対応する nビッ 卜の命令コ一ドが 割り付けられている。デ一夕転送命令の内、 第 1 7図に示されたものは 第 1 8図に示された S I MD命令と自由に組み合わせて並列実行命令 (複合命令) を構成することができる。複合命令には mビッ 卜のコード が割り付けられており、一般 R I S C命令と区別するための識別コード kビッ ト、 第 1 7図に示した転送命令のコ一ド pビッ ト、 及び第 1 8図 に示した S I MD命令のコード qビッ 卜で構成される。命令コ一ドのビ ッ ト数 (長さ) 、 n, m, k, p , qは命令の種類やオペランドの自由 度に応じて決めればよい。 命令を格納するメモリの制約から、 n二] η=32 或いは n=16、 m=32などの様に定めると、 夫々 3 2, 1 6ビッ ト/ヮ一 ドのメモリとの整合がよくなる。 S I MD型並列プロセッサでは演算能 力に対してデータ転送能力が不足する所謂ボトルネックを生ずる場合 が多いので、 このように、 デ一夕転送命令と S I MD命令とを組合わせ た複合命令を設けたものである。 尚、 複合命令を設けずに、 一般 R I S C命令と S I MD命令に同様な命令コードを割り付けてもよい。  FIG. 16 shows the structure of all the instruction codes of the general RISC instruction and the instruction combining the data transfer instruction and the SIMD instruction. The general RISC instruction is assigned an n-bit instruction code corresponding to the moniker. Of the data transfer instructions, the one shown in FIG. 17 can be freely combined with the SIMD instruction shown in FIG. 18 to constitute a parallel execution instruction (composite instruction). An m-bit code is assigned to the compound instruction, and the k-bit identification code for distinguishing from the general RISC instruction, the p-bit of the transfer instruction code shown in Fig. 17 and the 18th code It consists of q bits of the SIMD instruction code shown in the figure. The number of bits (length), n, m, k, p, and q of the instruction code may be determined according to the instruction type and the degree of freedom of the operand. Given the restrictions on the memory for storing instructions, if n2] η = 32 or n = 16, m = 32, etc., matching with 32 and 16 bits / word memory, respectively, is improved. In many cases, a so-called bottleneck in which the data transfer capability is insufficient with respect to the computing power occurs in the SIMD type parallel processor. Thus, a compound instruction combining the data transfer instruction and the SIMD instruction is provided. It is a thing. Note that similar instruction codes may be assigned to the general RISC instruction and the SIMMD instruction without providing a compound instruction.
第 2 2図には第 1 3図乃至第 1 5図に示された上記命令セッ トを用 いて作成したプログラムの一例が示される。第 2 2図のプログラムは第 5図に示したリ一ドソロモン符号の誤り訂正処理で行うュ一クリッ ド 互除法 1004の一部で、 誤り数値多項式 2004を求める部分である。 それ までの処理過程で新旧 2個の誤り数値多項式の係数がガロアバッファ 50 に格納されている。 新旧 2個の誤り数値多項式の次数は並列に動作 しているデータ処理部 101 , 102 , . . . 10η毎に異なるので、 新旧 2個の誤 り数値多項式の係数が格納されているガロアバッファ 50のアドレスが 整数レジスタ 70 に格納されている。 新しい誤り数値多項式の最高次と 最低次の係数は、 それそれ P0、 Ρ6で示されるガロアバッファ 50のァド レスに格納されている。 同様に、 古い誤り数値多項式の最高次と最低次 の係数は、 それそれ Ρ2、 Ρ7で示されるガロアバッファ 50のアドレスに 格納されている。ユークリッ ド互除法では新旧 2個の誤り数値多項式の 間の係数演算を行って、 古い誤り数値多項式を更新する。誤り数値多項 式の次数はュ一クリッ ド互除法の最初が最も高く、更新を繰り返すこと によって徐々に次数を下げ、最終的に適正な次数の誤り数値多項式 2004 が求められる。 FIG. 22 shows an example of a program created by using the instruction set shown in FIGS. 13 to 15. Fig. 22 shows the program This is a part of the Euclidean algorithm 1004 performed in the error correction processing of the lead-Solomon code shown in FIG. In the process up to that point, the coefficients of the two old and new error numerical polynomials are stored in the Galois buffer 50. Since the order of the two old and new error numerical polynomials differs for each data processing unit 101, 102,... 10η operating in parallel, the Galois buffer storing the coefficients of the two old and new error numerical polynomials is used. Is stored in integer register 70. The highest and lowest order coefficients of the new error numerical polynomial are stored in the address of the Galois buffer 50 indicated by P0 and Ρ6, respectively. Similarly, the highest and lowest order coefficients of the old error numerical polynomial are stored in the addresses of the Galois buffer 50 indicated by で 2 and Ρ7, respectively. In the Euclidean algorithm, the old error numerical polynomial is updated by performing a coefficient operation between the two old and new error numerical polynomials. The order of the error numerical polynomial is the highest at the beginning of the transciprocal algorithm, and the order is gradually reduced by repeating the update. Finally, the error numerical polynomial 2004 of an appropriate order is obtained.
第 2 2図は二つのリピートループ 3005 , 3009 で古い誤り数値多項式 の更新を行っている。第 1のリピ一トループ 3005はラベル 0MG1の付加 された演算命令 3006を、 PS1=PENDになるまで最大 1 6回繰り返すもの で、 誤り数値多項式の比較的高位の次数の計算を行っている。開始ァド レス、終了ァドレスが共にラベル 0MG1で示されたァドレスであるので、 演算命令 3006のみが繰り返される。同様に、第 2のリピ一トループ 3009 はラベル 0MG2の付加された演算命令 3010を、 PS2=PENDになるまで最大 1 6回繰り返すもので、誤り数値多項式の比較的低位の次数の計算を行 つている。 ポインタ(PS1 )21、 (PS2 )22 の値は演算されている新旧の誤 り数値多項式の係数の格納されているアドレスを与えており、並列処理 を行っているデータ処理部 101 , 102 , . . . 10n各々で異なる値になってい る。 ボイン夕 21,22の値は、 演算命令 3006 , 3010が実行される毎に 1ず つ減少し、 レジス夕(PEND )31 に格納されている値に一致すると、 該当 するデ一夕処理部は待機状態になる。処理の対象となる誤り数値多項式 の次数が低いものから、 順次待機状態になっていき、全てのデータ処理 部 101 , 102, . . . 10η が待機状態になった時点で リ ピー トループ 3005, 3009が強制終了される。 リピートループ 3005, 3009が終了すると、 次の命令 「set PEN0PEA=0」 3007, 3011 が実行され、 全てのデ一夕処理 部 101 , 102,. . . 10ηが動作状態に復帰する。 リピートループ 3005 , 3009 に繰り返し回数として設定した 1 6回は理論的に有り得る最大の繰り 返し回数であり、 リピートループの強制終了機構を持たない前記 S I M D型並列プロセッサで、このプログラムを実行しても正常に動作する。 所定回数の演算が終了した後、それそれ 1 6回合計 3 2回の繰り返しが 終了するまで待機状態になって結果を保持したまま待機する。実際の符 号の誤り訂正では、 繰り返し回数は第 1及び第 2のリ ピートル一プ 3005 , 3009の合計の平均が 1 2回程度である。 強制終了機構を持った実 施例では、 1 2回行えばよいが、 強制終了機構を持たない実施例では 3 2回の繰り返しを行うことになる。 Figure 22 shows the update of old error polynomials in two repeat loops 3005 and 3009. The first repeat loop 3005 repeats the operation instruction 3006 with the label 0MG1 up to 16 times until PS1 = PEND, and calculates a relatively high-order degree of the error numerical polynomial. Since both the start address and the end address are the addresses indicated by the label 0MG1, only the operation instruction 3006 is repeated. Similarly, the second repeat loop 3009 repeats the operation instruction 3010 with the label 0MG2 up to 16 times until PS2 = PEND, and calculates the lower order of the error numerical polynomial. I have. The values of the pointers (PS1) 21 and (PS2) 22 give the addresses where the coefficients of the new and old error numerical polynomials being calculated are stored, and the data processing units 101, 102,. .. 10n have different values for each You. The values of the boys 21 and 22 decrease by 1 each time the arithmetic instructions 3006 and 3010 are executed. When the values match the values stored in the register 31 (PEND) 31, the corresponding data processing unit It goes into a standby state. The error numerical polynomials to be processed are placed in a standby state sequentially from the one having a lower order, and when all the data processing units 101, 102,... 10η are in a standby state, a repeat loop 3005, 3009 Is forcibly terminated. When the repeat loop 3005, 3009 ends, the next instruction “set PEN0PEA = 0” 3007, 3011 is executed, and all the data processing units 101, 102,. The 16 times set as the number of repetitions in repeat loops 3005 and 3009 is the maximum theoretically possible number of repetitions, and even if this program is executed on the SIMD type parallel processor that does not have a repeat loop forcible termination mechanism, Works fine. After the predetermined number of calculations are completed, each time, a total of 16 times, a total of 32 times, the system waits until the repetition of 2 times ends, and waits while holding the result. In actual code error correction, the number of repetitions is about 12 times the average of the sum of the first and second repeat maps 3005 and 3009. In an embodiment having a forced termination mechanism, it may be performed once or twice, but in an embodiment without a forced termination mechanism, 32 or more repetitions may be performed.
以上示したように、第 4の実施例に示したリピート命令の強制終了機 構を有する S I M D型並列プロセッサは、処理ステツプ数を大幅に削減 できる効果がある。第 2 2図では誤り訂正プログラムの極く一部のルー チンについてのみ例示して説明したが、誤り訂正処理の別のルーチンや その他通常のディジタル信号処理でも、 リピートループの強制終了機構 が有効に作用する場合が多々ある。  As described above, the SIMD type parallel processor having the repeat instruction forced termination mechanism shown in the fourth embodiment has the effect of greatly reducing the number of processing steps. Although only a small part of the error correction program routine is illustrated and described in Fig. 22, the repeat loop forced termination mechanism can be effectively used in other error correction processing routines and other ordinary digital signal processing. It often works.
第 2 3図には、以上説明した S I M D型並列プロセッサを DVD/CD-R0M 装置へ適用したシステムプロック図が示される。第 1図乃至第 4図又は 第 1 9図のうちの一つを用いて説明した S I M D型並列プロセッサに は、 バスィン夕フェース回路 901によって周辺バスを介し周辺回路 900 が接続されている。周辺回路 900は、 例えばアナログィン夕フェース回 路 (アナログ I/F) 905、 ビックアップ 913 を制御するための D / A変 換器 904、 モータ 911 , 912制御用の P WM (Pulse Width Modulation)変 調回路 903、 音声出力用の D /A変換器 902とされる。 アナログイン夕 フェース回路 905は、アナログ信号処理回路 909を経てビックアツプ 913 を制御し、 データを取り込むと共に、 制御に必要な情報を取り込む。 制 御に必要な情報は、 レンズフォーカス、 エンベロープ、 フォーカス、 ト ラッキングの情報である。これらの情報からプロセッサの制御部 200で データ処理を行い、ビックアップ 913のフォーカスとトラヅキング調整、 スレツ ドモ一夕 912とスピンドルモー夕 911のドライブを行う。媒体か ら読み出されたデ一夕はアナログィン夕フェース回路 905を経て、メモ リ 801 , 802に取り込まれ、 デ一夕処理部 101 , 102, . . . 10ηで誤り訂正処 理を施されて出力される。 本発明は、 制御部 200を SIMD型並列プロセ ッサの制御にのみ用いるのではなく、 一般の RI SC命令を備えて、 サー ボ制御処理、 トラッキング制御処理などを、 誤り訂正と時分割で行う為 にも用いる。 さらに、 好適な実施例においては、 一般の D S P命令を実 行できる機構を追加して、 全てのシステム制御タスク、信号処理タスク、 誤り訂正などの特殊データ処理タスクを一括したプログラムで記述す る。 この時、 従来別部品であった制御マイコン、 サ一ボ /トラヅキング 制御用の L S I若しくは D S Pを省略して、本発明の一例に係る S I M D型並列プロセッザで一括処理が可能となり、装置コス トが大幅に低減 される。 さらに、 全てのタスクを一括して開発できるために、 タスク間 の整合を極めて容易に取ることができ、 開発期間を大幅に短縮できる。 FIG. 23 shows a system block diagram in which the SIMD parallel processor described above is applied to a DVD / CD-R0M device. The SIMD parallel processor described with reference to one of FIG. 1 to FIG. 4 or FIG. The peripheral circuit 900 is connected via a peripheral bus by a bus interface circuit 901. The peripheral circuit 900 includes, for example, an analog interface circuit (analog I / F) 905, a D / A converter 904 for controlling the big-up 913, and a PWM (Pulse Width Modulation) converter for controlling the motors 911 and 912. The adjustment circuit 903 and the D / A converter 902 for audio output. The analog interface circuit 905 controls the big-up 913 via the analog signal processing circuit 909, fetches data, and fetches information necessary for control. The information required for control is information on lens focus, envelope, focus, and tracking. Based on this information, the control unit 200 of the processor performs data processing, adjusts the focus and tracking of the make-up 913, and drives the thread motor 912 and the spindle motor 911. The data read from the medium is taken into the memories 801 and 802 through the analog interface circuit 905, and subjected to error correction processing by the data processing units 101, 102,... 10η. Is output. The present invention does not use the control unit 200 only for controlling the SIMD parallel processor, but includes a general RISC instruction to perform servo control processing, tracking control processing, and the like by error correction and time division. Also used for Furthermore, in the preferred embodiment, a mechanism capable of executing general DSP instructions is added, and all system control tasks, signal processing tasks, and special data processing tasks such as error correction are described in a batch program. At this time, the control microcomputer, servo / tracking control LSI or DSP, which was a separate component in the past, can be omitted, and the SIMD parallel processor according to an example of the present invention can perform batch processing, resulting in a large equipment cost. It is reduced to Furthermore, since all tasks can be developed collectively, it is very easy to match tasks and the development period can be significantly reduced.
DVD と CD- ROMの記録フーマツ トは当然異なるが、 両方のメディアが 同時に再生されることはないので、 DVDを再生するプログラムと CD-ROM を再生するプログラムを切り替えて用いることによって、容易に両者を 再生できる装置が提供できる。 Although the recording formats of DVD and CD-ROM are different, both media are not played at the same time, so a program to play DVD and CD-ROM By switching and using a program for reproducing both, an apparatus that can easily reproduce both can be provided.
上記データ処理システムによれば、誤り訂正処理のプログラムとビッ クアップ、 モー夕、 音声出力のプログラムが一括して開発できるので、 装置の開発にかかるコス 卜が大幅に削減される。 また、 従来は装置とし て全体を制御するマイコンが必要であつたが、本実施例によれば誤り訂 正を行うプロセッサが装置全体の制御も行うので、装置コスト自体も大 幅に削減できる。  According to the data processing system described above, since the program for error correction processing and the program for bit-up, mode, and audio output can be developed collectively, the cost for device development is greatly reduced. Conventionally, a microcomputer for controlling the entire device is required as a device, but according to the present embodiment, a processor for performing error correction also controls the entire device, so that the device cost itself can be significantly reduced.
上記実施例は、 DVD/CD- ROM装置へ適用した例であるが、 放送系のメ ディアに対応するには、 ピックアップ、 モー夕などを、 復調回路、 通信 プロ トコル制御回路、などに置き換えることによって容易に実現される。 以上に述べてきたように、符号の誤り訂正処理を行うのに S I M D型 並列プロセッサを導入したことによって、メディアの要求する処理速度 の向上に対して基本的なアーキテクチャやプログラムの変更なしに、デ 一夕処理部の数を増加して並列度を高めるだけで、容易に対応ができる。 異なった規格の符号に対しては、 プログラムの変更で対処可能で、 複数 の規格の誤り訂正を想定するようなシステムにも容易に対応できる。 S I M D型並列プロセッサでは条件分岐の方法が重大な課題であるが、 複数設けたデータ処理部を個々の演算結果に基づいて待機状態にして、 分岐処理を実現できる。 また、 複数設けた全てのデータ処理部が待機状 態になって無駄なサイクルが生じないように、各々のデ一夕処理部の動 作状態を監視、 制御する手法により、 処理効率の向上を実現することが できる。  The above embodiment is an example in which the present invention is applied to a DVD / CD-ROM device.However, in order to cope with broadcasting media, pickups, modems, and the like are replaced with demodulation circuits, communication protocol control circuits, and the like. It is easily realized by: As described above, the introduction of a SIMD parallel processor to perform code error correction processing enables the data processing speed to be improved without any change in the basic architecture or program without increasing the processing speed. Only by increasing the number of overnight processing units to increase the degree of parallelism, it is possible to easily cope. Codes of different standards can be dealt with by changing the program, and systems that assume error correction of multiple standards can be easily handled. Although the method of conditional branching is an important issue in SIMD parallel processors, branch processing can be realized by placing multiple data processing units in a standby state based on the results of individual calculations. In addition, the method of monitoring and controlling the operation status of each data processing unit to improve the processing efficiency so that all the data processing units provided are in a standby state and no useless cycle occurs. It can be realized.
以上本発明者によってなされた発明を実施例に基づいて具体的に説 明したが、 本発明はそれに限定されるものではなく、 その要旨を逸脱し ない範囲において種々変更可能であることは言うまでもない。 例えば、データ処理装置が 1個の半導体集積回路で構成されるとき、 当該データ処理装置の動作プログラムは内蔵 R O Mに保有し、或いは外 部 R O Mなどで提供することができる。 また、 データ処理部の数、 デー 夕処理部と制御部を接続するバス構成は上記実施例に限定されず適宜 変更することができる。 また、 各データ処理部の待機状態を制御部に通 知する手法はデータ処理部毎の信号によって制御部に通知する構成に 限定されない。制御部は、 共通バスを介して制御部が待機レジス夕をァ クセスすることによって待機状態か否かを参照してもよい。 産業上の利用可能性 Although the invention made by the inventor has been specifically described based on the embodiments, the present invention is not limited thereto, and it is needless to say that various modifications can be made without departing from the gist of the invention. . For example, when the data processing device is configured by one semiconductor integrated circuit, the operation program of the data processing device can be stored in a built-in ROM or provided by an external ROM or the like. Further, the number of data processing units and the bus configuration for connecting the data processing unit and the control unit are not limited to the above-described embodiment, and can be changed as appropriate. Further, the method of notifying the control unit of the standby state of each data processing unit is not limited to a configuration in which the control unit is notified by a signal for each data processing unit. The control unit may refer to the standby state by accessing the standby register via the common bus to determine whether the standby state is established. Industrial applicability
本発明は、並列演算処理性能若しくは並列演算処理効率の向上を企図 する S I M D型並列デ一夕プロセッサのようなデータ処理装置、そして、 蓄積系や通信系における符号の誤り訂正のためのデータの符号化及び 復号を行な う データ処理シ ス テ ム、 例えば、 CD- ROM、 DVD、 M0(Magneto- Optics)などの記録媒体の情報再生若しくは情報記録シス テム、 更には衛星放送受信システムなどに広く適用することができる。  The present invention relates to a data processing device such as a SIMD parallel data processor for improving parallel processing performance or parallel processing efficiency, and a data code for correcting a code error in a storage system or a communication system. Data processing system that performs encryption and decryption, for example, information reproduction or information recording systems on recording media such as CD-ROM, DVD, M0 (Magneto-Optics), and satellite broadcast receiving systems. Can be applied.

Claims

請 求 の 範 囲 The scope of the claims
1 . フェッチした命令を解読して実行する制御部と、 前記制御部から演 算動作のための制御情報が並列的に与えられると共に、前記制御部に よってデータ転送制御される複数個のデータ処理部とを含み、 前記夫々のデータ処理部は、前記制御情報に従った演算動作の結果 に応じてデータ処理部を待機状態にする待機制御手段を含み、前記制 御部は夫々のデ一夕処理部を待機状態から活性状態に復帰させるも のであることを特徴とするデータ処理装置。 1. A control unit that decodes and executes a fetched instruction, and a plurality of data processes that are provided with control information for an arithmetic operation from the control unit in parallel and that are subjected to data transfer control by the control unit Each of the data processing units includes a standby control unit that sets the data processing unit to a standby state in accordance with a result of an arithmetic operation according to the control information, and the control unit includes A data processing device for returning a processing unit from a standby state to an active state.
2 . 前記待機制御手段は、 データ処理部による演算動作の結果が特定の 状態になったか否かを判定する判定手段と、前記判定手段による前記 特定状態の検出に同期してセッ ト状態にされ、前記制御部からの特定 の制御情報によってリセッ ト状態にされる待機レジス夕と、前記待機 レジス夕のセッ ト状態に応答してデータ処理部による演算動作を停 止させる手段とを含んで成るものであることを特徴とする請求の範 囲第 1項に記載のデータ処理装置。  2. The standby control unit includes a determination unit that determines whether a result of the arithmetic operation performed by the data processing unit has reached a specific state, and a standby state that is set in synchronization with the detection of the specific state by the determination unit. A standby register set to a reset state by specific control information from the control unit, and means for stopping an arithmetic operation by the data processing unit in response to the set state of the standby register. The data processing device according to claim 1, wherein the data processing device is a data processing device.
3 . 前記演算動作を停止させる手段は、 クロック信号に同期して演算動 作を行う回路部分へのクロック信号の供給を選択的に停止させる回 路であることを特徴とする請求の範囲第 2項に記載のデータ処理装  3. The means for stopping the arithmetic operation is a circuit for selectively stopping supply of a clock signal to a circuit portion performing the arithmetic operation in synchronization with a clock signal. Data processing equipment described in section
4 . 前記制御部は、 夫々のデータ処理部が待機状態であるか否かを検出 する検出手段と、前記検出手段による検出結果を参照してデータ処理 部を待機状態から活性状態に復帰させる論理手段とを含んで成るも のであることを特徴とする請求の範囲第 1項に記載のデータ処理装 置。 4. The control unit includes: a detecting unit that detects whether each data processing unit is in a standby state; and a logic that returns the data processing unit from the standby state to the active state by referring to a detection result by the detecting unit. 2. The data processing device according to claim 1, wherein the data processing device comprises:
5 . 前記論理手段は、 全てのデータ処理部が待機状態にあるとき、 前記 制御部による命令実行順序を変更すると共に、待機状態にあるデータ '処理部を待機状態から活性状態に復帰させるものであることを特徴 とする請求の範囲第 4項に記載のデータ処理装置。 5. The logic means, when all data processing units are in a standby state, 5. The data processing apparatus according to claim 4, wherein the instruction execution order by the control unit is changed, and the data processing unit in the standby state is returned from the standby state to the active state.
6 . 前記データ処理部は、 ガロア体の乗算回路と加算回路を含み、 前記 制御部は、前記ガロア体の乗算回路と加算回路を制御するための演算 命令として、 ガロア体乗算命令、 ガロア体加算命令、 及びガロア体積 和演算命令を少なく とも実行するものであり、 1個の半導体基板に形 成されて成るものであることを特徴とする請求の範囲第 4項又は第 5項に記載のデータ処理装置。 6. The data processing unit includes a Galois field multiplication circuit and an addition circuit, and the control unit includes a Galois field multiplication instruction and a Galois field addition as operation instructions for controlling the Galois field multiplication circuit and the addition circuit. 6.The data according to claim 4 or 5, wherein the instructions execute at least an instruction and a Galois volume sum operation instruction, and are formed on a single semiconductor substrate. Processing equipment.
7 .ガロア体上で定義された符号の誤り訂正を行うプログラムを格納し たプログラムメモリを更に有し、前記制御部は前記プログラムメモリ から命令をフェッチし、前記データ処理部を用いて誤り訂正処理を行 うものであることを特徴とする請求の範囲第 6項に記載のデータ処 7. The computer further includes a program memory storing a program for performing error correction of a code defined on the Galois field, wherein the control unit fetches an instruction from the program memory and performs error correction processing using the data processing unit. The data processing according to claim 6, wherein the data processing is performed.
8 . フェッチした命令を解読して実行する制御部と、前記制御部から演 算動作のための制御情報が並列的に与えられると共に、前記制御部に よってデータ転送制御される複数個のデータ処理部と、前記制御部に よってアクセスされる記憶手段とを含み、 8. A control unit that decodes and executes the fetched instruction, and a plurality of data processes that are provided with control information for arithmetic operations from the control unit in parallel and that are subjected to data transfer control by the control unit Unit, and storage means accessed by the control unit,
前記夫々のデータ処理部は、 第 1の演算回路と、 前記第 1の演算回 路に接続されたバッファ手段と、前記バッファ手段のァドレスを変更 可能に指定する複数個のボイン夕手段とを含み、前記夫々のデ一夕処 理部のバッファ手段はデ一夕バスを介して前記記憶手段に接続され て成るものであることを特徴とするデータ処理装置。  Each of the data processing units includes a first arithmetic circuit, buffer means connected to the first arithmetic circuit, and a plurality of bus means for changing the address of the buffer means so as to be changeable. A data processing device, wherein the buffer means of each of the data processing units is connected to the storage means via a data bus.
9 . 夫々のデータ処理部は、 前記制御部によりボイン夕手段に設定され たァドレス情報の更新に用いられる第 2の演算手段を更に含んで成 るものであることを特徴とする請求の範囲第 8項に記載のデータ処 9. Each of the data processing units further includes a second calculation unit used for updating address information set in the binding unit by the control unit. Data processing described in section 8
0 . 前記制御部は、前記データ処理部での並列的な演算を規定する演 算命令と、前記データ処理部に対するデータ転送を規定するデータ転 送命令とを実行する命令実行手段を含んで成るものであることを特 徴とする請求の範囲第 8項に記載のデータ処理装置。 0. The control unit includes an instruction execution unit that executes an operation instruction that specifies a parallel operation in the data processing unit and a data transfer instruction that specifies data transfer to the data processing unit. 9. The data processing apparatus according to claim 8, wherein the data processing apparatus is a data processing apparatus.
1 1 . 前記命令実行手段は、 前記演算命令とデータ転送命令とを並列的 に実行するものであることを特徴とする請求項 1 0に記載のデ一夕 処理装置。  11. The data processing apparatus according to claim 10, wherein the instruction execution means executes the operation instruction and the data transfer instruction in parallel.
1 2 . 前記命令実行手段は、 前記演算命令に含まれる単一の命令であつ て、前記ボイン夕手段で指定されバッファ手段から取得したデータを 演算し、演算結果を前記ボイン夕手段とは別のボイン夕手段で指定さ れたバッファ手段に格納すると共に、前記ボイン夕の内容を更新する 操作を指示する命令を実行可能であることを特徴とする請求の範囲 第 1 1項に記載のデータ処理装置。  12. The instruction execution means is a single instruction included in the operation instruction, calculates the data specified by the buffer means and acquired from the buffer means, and separates the operation result from the data from the buffer means. 11. The data according to claim 11, wherein the data can be stored in the buffer means designated by the data means, and an instruction to update the contents of the data means can be executed. Processing equipment.
1 3 . 前記夫々のデータ処理部は、 前記制御情報に従った演算動作の結 果に応じてデータ処理部を待機状態にする待機制御手段を更に有し、 夫々の前記データ処理部を待機状態から活性状態に復帰させる制御 を前記制御部が行なうものであることを特徴とする請求の範囲第 8 項に記載のデータ処理装置。 13. Each of the data processing units further includes a standby control unit that sets the data processing units to a standby state according to a result of an arithmetic operation in accordance with the control information, and sets each of the data processing units to a standby state. 9. The data processing device according to claim 8, wherein the control section performs control to return to an active state from the state.
1 4 . 前記夫々のデータ処理部は、 前記ボインタ手段の値が所定値に到 達した状態を検出して前記待機制御手段に当該データ処理部を待機 状態にさせる判定手段を有するものであることを特徴とする請求の 範囲第 1 3項に記載のデータ処理装置。 14. Each of the data processing units has a determination unit that detects a state in which the value of the pointer unit has reached a predetermined value and causes the standby control unit to set the data processing unit to a standby state. 14. The data processing device according to claim 13, wherein:
1 5 . 前記待機制御手段は、 クロック信号に同期して演算動作を行う回 路部分へのクロック信号の供給を選択的に停止させて待機状態にす るものであることを特徴とする請求の範囲第 1 3項又は第 1 4項に 記載にデータ処理装置。 15. The standby control means selectively stops supply of a clock signal to a circuit portion that performs an arithmetic operation in synchronization with a clock signal, and enters a standby state. Scope 13 or 14 Data processing device as described.
6 .前記第 1の演算回路はガロア体演算器を備えることを特徴とする 請求の範囲第 1 3項又は第 1 4項に記載のデータ処理装置。  6. The data processing device according to claim 13, wherein the first arithmetic circuit includes a Galois field arithmetic unit.
7 . 前記制御部は、 夫々のデータ処理部が待機状態であるか否かを検 出する検出手段と、前記検出手段による検出結果を参照してデータ処 理部を待機状態から復帰させる論理手段とを更に含んで成るもので あることを特徴とする請求の範囲第 1 3項に記載のデータ処理装置。 8 . 前記論理手段は、 全てのデータ処理部が待機状態にあるとき、 前 記制御部による命令実行順序を変更すると共に、待機状態にあるデー 夕処理部を待機状態から活性状態に復帰させるものであることを特 徴とする請求の範囲第 1 7項に記載のデータ処理装置。  7. The control unit includes a detection unit that detects whether each data processing unit is in a standby state, and a logic unit that returns the data processing unit from the standby state by referring to a detection result by the detection unit. 14. The data processing device according to claim 13, further comprising: 8. The logic means, when all the data processing units are in the standby state, changes the order of instruction execution by the control unit and returns the data processing unit in the standby state from the standby state to the active state. The data processing device according to claim 17, characterized in that:
9 . 前記データ処理部は、 ガロア体の乗算回路と加算回路を含み、 前 記制御部は、前記ガロア体の乗算回路と加算回路を制御するための演 算命令として、 ガロア体乗算命令、 ガロア体加算命令、 及びガロア体 積和演算命令を少なく とも実行するものであり、 1個の半導体基板に 形成されて成るもるものであることを特徴とする請求の範囲第 1 7 項又は第 1 8項に記載のデータ処理装置。  9. The data processing unit includes a Galois field multiplication circuit and an addition circuit, and the control unit includes a Galois field multiplication instruction and a Galois field as operation instructions for controlling the Galois field multiplication circuit and the addition circuit. 18. The method according to claim 17, wherein the instructions execute at least a field addition instruction and a Galois field product-sum operation instruction, and are formed on a single semiconductor substrate. Item 9. The data processing device according to item 8.
0 . 前記制御部は、 データ処理手段に対するデ一夕転送命令、 前記制 御部内部でデ一夕を操作する命令、及び前記制御部がフエツチする命 令を分岐させる分岐命令を更に実行可能であることを特徴とする請 求項 1 9に記載のデータ処理装置。  0. The control unit can further execute a data transfer instruction to the data processing unit, an instruction to operate the data inside the control unit, and a branch instruction to branch an instruction fetched by the control unit. The data processing device according to claim 19, characterized in that:
1 . フェッチした命令を解読して実行する制御部と、 前記制御部から 演算動作のための制御情報が並列的に与えられると共に、前記制御部 によってデ一夕転送制御される複数個のデ一夕処理部とを含み、 前記夫々のデータ処理部は、前記制御情報に従った演算動作の結果 に応じてデータ処理部を待機状態にする待機制御手段を含み、前記制 御部はデータ処理部が待機状態であるか否かを参照する手段を有し、 その参照結果に基づいて前記データ処理部を待機状態から活性状態 に復帰させるものであり、 1. A control unit that decodes and executes a fetched instruction, and a plurality of data units that are provided with control information for arithmetic operations from the control unit in parallel and that are controlled by the control unit to transfer data over time. An evening processing unit, wherein each of the data processing units includes a standby control unit that puts the data processing unit in a standby state according to a result of an arithmetic operation according to the control information, The control unit has means for referring to whether or not the data processing unit is in the standby state, and returns the data processing unit from the standby state to the active state based on the reference result.
前記制御部は、前記データ処理部を待機状態にする条件をデータ処 理部に設定すると共に、設定した時に当該設定された条件が成立する データ処理部を待機状態にさせる命令を実行可能であることを特徴 とするデータ処理装置。  The control unit is capable of setting a condition for setting the data processing unit in a standby state in the data processing unit, and executing an instruction for setting the data processing unit in a standby state when the set condition is satisfied when set. A data processing device, characterized in that:
2 2 . フェッチした命令を解読して実行する制御部と、 前記制御部から 演算動作のための制御情報が並列的に与えられると共に、前記制御部 によってデータ転送制御される複数個のデータ処理部とを含み、 前記夫々のデータ処理部は、前記制御情報に従った演算動作の結果 に応じてデータ処理部を待機状態にする待機制御手段を含み、前記制 御部はデータ処理部が待機状態であるか否かを参照する手段を有し、 その参照結果に基づいてデータ処理部を待機状態から活性状態に復 帰させるものであり、 22. A control unit that decodes and executes the fetched instruction, and a plurality of data processing units that are provided with control information for arithmetic operations from the control unit in parallel and that are subjected to data transfer control by the control unit Wherein each of the data processing units includes a standby control unit that puts the data processing unit in a standby state in accordance with a result of an arithmetic operation according to the control information, and wherein the control unit includes a data processing unit in a standby state. And means for returning the data processing unit from the standby state to the active state based on the reference result.
前記制御部は、前記データ処理部を待機状態にする条件をデータ処 理部に設定すると共に、前記条件設定の後の命令実行サイクルにおい て当該設定された条件が成立するデータ処理部を待機状態にさせる 命令を実行可能であることを特徴とするデータ処理装置。  The control unit sets a condition for setting the data processing unit in the standby state in the data processing unit, and sets the data processing unit in which the set condition is satisfied in the instruction execution cycle after the condition setting to the standby state. A data processing device capable of executing an instruction.
2 3 . 前記制御部は、 前記複数のデータ処理部を個々に待機状態にし又 は待機状態から活性状態に復帰させる指示を与える命令を実行可能 であることを特徴とする請求の範囲第 2 1項又は第 2 2項に記載の データ処理装置。 23. The control unit according to claim 21, wherein the control unit is capable of executing an instruction for giving an instruction to individually set the plurality of data processing units to a standby state or to return to the active state from the standby state. Item 30. The data processing device according to item 22.
2 4 . 前記待機制御手段は、 データ処理部による演算動作の結果が前 記待機状態にする条件を満足するか否かを判定する判定手段と、前記 判定手段により前記条件の満足が検出されるのに同期してセッ ト状 態にされ、前記制御部からの特定の制御情報にしたがってリセッ ト状 態にされる待機レジス夕と、前記待機レジス夕のセッ ト状態に応答し てデータ処理部による演算動作を停止させる手段とを含んで成るも のであることを特徴とする請求の範囲第 2 1項又は第 2 2項に記載 のデータ処理装置。 24. The standby control unit includes: a determination unit configured to determine whether a result of an arithmetic operation performed by the data processing unit satisfies the condition for setting the standby state; and the determination unit determines that the condition is satisfied. Set in synchronization with A standby register that is set in a reset state according to specific control information from the control unit, and means for stopping an arithmetic operation by the data processing unit in response to the set state of the standby register. The data processing device according to claim 21 or 22, wherein the data processing device comprises:
5 . フェッチした命令を解読して実行する制御部と、 前記制御部から 演算動作のための制御情報が並列的に与えられると共に、前記制御部 によってデータ転送制御される複数個のデ一夕処理部と、前記制御部 によってアクセスされる記憶手段とを有し、  5. A control unit that decodes and executes the fetched instruction, and a plurality of data processings in which control information for arithmetic operation is given in parallel from the control unit and data transfer is controlled by the control unit And a storage means accessed by the control unit,
前記夫々のデータ処理部は、 第 1の演算回路と、 前記第 1の演算回 路に接続されると共にデータバスを介して前記記憶手段に接続され たバッファ手段と、前記バッファ手段のァドレスを変更可能に指定す るための値を保有する複数個のボイン夕手段と、前記ボイン夕手段が 保有する値を更新する第 2の演算回路と、任意の値が設定されるレジ ス夕手段と、前記レジスタ手段の値と前記ボイン夕手段の値との一致 を検出する比較手段と、前記比較手段による一致検出と前記制御情報 に従った前記第 1の演算回路による所定の演算結果との内の少なく とも一つに応答して第 1の状態にされ、前記制御部から与えられる所 定の制御情報によって第 2の状態にされる待機レジス夕と、前記待機 レジス夕の前記第 1の状態に応答して当該デ一夕処理部の内部を待 機状態にする制御手段とを含んで成るものであることを特徴とする データ処理装置。  The respective data processing units are a first arithmetic circuit, a buffer unit connected to the first arithmetic circuit and connected to the storage unit via a data bus, and an address of the buffer unit is changed. A plurality of register means for holding a value to be designated as possible, a second arithmetic circuit for updating the value held by the register means, and a register means for setting an arbitrary value; Comparing means for detecting a match between the value of the register means and the value of the boyne means; and a detection result of the match by the comparing means and a predetermined calculation result by the first calculation circuit according to the control information. A standby register that is brought into the first state in response to at least one of the first states, and is brought into the second state by predetermined control information given from the control unit; and Respond to the request overnight Data processing apparatus, wherein the interior of those comprising a control means for the wait state.
6 . フェッチした命令を解読して実行する制御部と、 前記制御部から 演算動作のための制御情報が並列的に与えられると共に、前記制御部 によってデ一夕転送制御される複数個のデータ処理部と、前記制御部 によってアクセスされる記憶手段とを含み、 前記夫々のデータ処理部は、前記制御情報に従った演算動作の結果 に応じてデ一夕処理部を待機状態にする待機制御手段を含み、当該待 機制御手段は、前記制御部からの指示に従って前記データ処理部の内 部を待機状態から活性状態に復帰させるものであり、 6. A control unit that decodes and executes the fetched instruction, and a plurality of data processes that are provided with control information for arithmetic operations from the control unit in parallel, and that are subjected to data transfer control by the control unit. Unit, and storage means accessed by the control unit, Each of the data processing units includes a standby control unit that sets the data processing unit to a standby state according to a result of an arithmetic operation according to the control information, and the standby control unit includes an instruction from the control unit. And returns the inside of the data processing unit from the standby state to the active state according to
前記制御部は、夫々のデータ処理部が待機状態であるか否かを検出 する検出手段と、前記検出手段による検出結果に応じてデ一夕処理部 を待機状態から活性状態に復帰させる論理手段とを含み、 更に、 繰り 返しループの開始ァドレス、繰り返しループの終了ァドレス及び繰り 返しループの繰り返し回数を指定する命令を実行したとき前記開始 アドレスから終了ァドレスの命令に従って前記デ一夕処理部を最大 限前記繰り返し回数だけ並列演算動作させるものであることを特徴 とするデータ処理装置。  The control unit includes a detection unit that detects whether each data processing unit is in a standby state, and a logic unit that returns the data processing unit from the standby state to the active state according to a detection result by the detection unit. Further, when an instruction for designating a start address of the repetition loop, an end address of the repetition loop, and the number of repetitions of the repetition loop is executed, the data processing unit is maximally operated according to the instruction of the end address from the start address. A data processing device for performing a parallel operation only for the number of repetitions.
7 . フェッチした命令を解読して実行する制御部と、 前記制御部から 演算動作のための制御情報が並列的に与えられると共に、前記制御部 によってデータ転送制御される複数個のデータ処理部とを含み、 前記夫々のデータ処理部は、前記制御情報に従った演算動作の結果 に応じてデータ処理部を待機状態にする待機制御手段を含み、当該待 機制御手段は、前記制御部からの指示に従って前記データ処理部の内 部を待機状態から活性状態に復帰させるものであり、  7. A control unit that decodes and executes the fetched instruction, and a plurality of data processing units that are provided with control information for an arithmetic operation from the control unit in parallel and that are subjected to data transfer control by the control unit. Wherein each of the data processing units includes a standby control unit that puts the data processing unit into a standby state in accordance with a result of an arithmetic operation according to the control information, and the standby control unit includes: According to an instruction, the inside of the data processing unit is returned from the standby state to the active state,
前記制御部は、夫々のデータ処理部が待機状態であるか否かを検出 する検出手段と、前記検出手段による検出結果に応じてデータ処理部 を待機状態から活性状態に復帰させる論理手段とを含み、 更に、 繰り 返しループの閧始ァドレス、 繰り返しループの終了ァドレス、 繰り返 しループの繰り返し回数及び繰り返しループを強制終了する条件を 指定する命令を実行したとき、前記強制終了条件が成立しない限り前 記開始ァドレスから終了ァドレスの命令に従って前記データ処理部 を繰り返し並列演算動作させるものであることを特徴とするデータThe control unit includes: a detection unit that detects whether each data processing unit is in a standby state; and a logic unit that returns the data processing unit from the standby state to an active state according to a detection result by the detection unit. In addition, when an instruction that specifies the start address of the repetition loop, the end address of the repetition loop, the number of repetitions of the repetition loop, and the condition for forcibly terminating the repetition loop is executed, unless the forcible termination condition is satisfied. The data processing unit according to the instructions from the start address to the end address. Characterized by the fact that parallel operation is performed repeatedly
'処理装置。 'Processing equipment.
8 .全てのデータ処理部の待機状態を前記繰り返しループの強制終了 条件として設定可能であることを特徴とする請求の範囲第 2 7項に 記載のデータ処理装置。  8. The data processing apparatus according to claim 27, wherein a standby state of all data processing units can be set as a condition for forcibly terminating the repetition loop.
9 .少なくとも 1個のデータ処理部の待機状態を前記繰り返しループ の強制終了条件として設定可能であることを特徴とする請求の範囲 第 2 8項に記載のデータ処理装置。  9. The data processing device according to claim 28, wherein a standby state of at least one data processing unit can be set as a condition for forcibly terminating the iterative loop.
0 . 前記デ一夕処理部は、 ガロア体の乗算回路と加算回路を含み、 前 記制御部は、前記ガロア体の乗算回路と加算回路を制御するための演 算命令として、 ガロア体乗算命令、 ガロア体加算命令、 及びガロア体 積和演算命令を少なく とも実行するものであることを特徴とする請 求の範囲第 2 6項又は第 2 7項に記載のデ一夕処理装置。  0. The data processing unit includes a Galois field multiplication circuit and an addition circuit, and the control unit includes a Galois field multiplication instruction as an operation instruction for controlling the Galois field multiplication circuit and the addition circuit. 28. The data processing apparatus according to claim 26, wherein said processing apparatus executes at least a Galois field addition instruction, and a Galois field multiply-accumulate operation instruction.
1 .ガロア体上で定義された符号の誤り訂正を行うプログラムを格納 したプログラムメモリを更に有し、前記制御部は前記プログラムメモ リから命令をフェッチし、前記デ一夕処理部を用いて誤り訂正処理を 行うものであることを特徴とする請求の範囲第 3 0項に記載のデー 夕処理装置。  1. The apparatus further includes a program memory storing a program for performing error correction of a code defined on the Galois field, wherein the control unit fetches an instruction from the program memory and uses the data processing unit to execute an error. 30. The data processing apparatus according to claim 30, wherein the data processing apparatus performs a correction process.
2 . 前記誤り訂正処理は、 ガロア体上で定義された符号データのシン ドローム演算処理と、シンドローム演算処理によって得られたシンド ロームを用いた誤り有無の判定処理と、誤りが検出されたシン ドロー ムを前記記憶手段に格納する処理とを複数回繰り返し、 その後、 前記 格納されたシン ドロームを記憶手段から読み出して誤り訂正演算処 理を行う処理であることを特徴とする請求の範囲第 3 1項に記載の デ一夕処理装置。  2. The error correction process includes a syndrome operation process of code data defined on the Galois field, a process of determining the presence or absence of an error using the syndrome obtained by the syndrome operation process, and a process of detecting an error. A process of repeating the process of storing a program in the storage means a plurality of times, and thereafter, reading the stored syndrome from the storage means and performing an error correction operation process. Item 1. An overnight processing device.
3 . ガロア体上で定義された符号データの入力手段と、 請求の範囲第 3 1項又は第 3 2項に記載のデータ処理装置と、デ一夕の出力手段と 'を含み、 前記データ処理装置は、 そのプログラムメモリに格納された プログラムに基づいて、前記入力手段から入力された符号データの誤 り訂正を行うものであることを特徴とするデ一夕処理システム。 ) 3 4 . 前記データ処理装置に含まれる制御部は、 前記入力手段及び出力 手段による入出力制御と、符号データの誤り訂正処理とを時分割で実 行するものであることを特徴とする請求の範囲第 3 3項に記載のデ —夕処理システム。 3. Input means for code data defined on the Galois field, 31. The data processing device according to item 3 or 32, and a data output device, wherein the data processing device receives an input from the input device based on a program stored in its program memory. A data processing system for performing error correction of encoded data. 34. The control unit included in the data processing device is configured to execute the input / output control by the input unit and the output unit and the error correction process of the code data in a time-division manner. The evening processing system described in paragraph 33 of the scope.
PCT/JP1997/003259 1997-09-16 1997-09-16 Data processor and data processing system WO1999014685A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP1997/003259 WO1999014685A1 (en) 1997-09-16 1997-09-16 Data processor and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP1997/003259 WO1999014685A1 (en) 1997-09-16 1997-09-16 Data processor and data processing system

Publications (1)

Publication Number Publication Date
WO1999014685A1 true WO1999014685A1 (en) 1999-03-25

Family

ID=14181117

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1997/003259 WO1999014685A1 (en) 1997-09-16 1997-09-16 Data processor and data processing system

Country Status (1)

Country Link
WO (1) WO1999014685A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6732253B1 (en) 2000-11-13 2004-05-04 Chipwrights Design, Inc. Loop handling for single instruction multiple datapath processor architectures
US6931518B1 (en) 2000-11-28 2005-08-16 Chipwrights Design, Inc. Branching around conditional processing if states of all single instruction multiple datapaths are disabled and the computer program is non-deterministic
JP2013161271A (en) * 2012-02-06 2013-08-19 Ricoh Co Ltd Simd (single instruction-stream multiple data-stream) type microprocessor
US9069938B2 (en) 2006-11-03 2015-06-30 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US9235393B2 (en) 2002-07-09 2016-01-12 Iii Holdings 2, Llc Statically speculative compilation and execution
US9244689B2 (en) 2004-02-04 2016-01-26 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US9569186B2 (en) 2003-10-29 2017-02-14 Iii Holdings 2, Llc Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US9582650B2 (en) 2003-11-17 2017-02-28 Bluerisc, Inc. Security of program executables and microprocessors based on compiler-architecture interaction
CN116909626A (en) * 2023-09-13 2023-10-20 腾讯科技(深圳)有限公司 Data processing method, processor and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03156558A (en) * 1989-11-14 1991-07-04 Nec Home Electron Ltd Method for communication between host cpu and coprocessor
JPH04130910A (en) * 1990-09-21 1992-05-01 Nec Corp Information processor
JPH05189585A (en) * 1992-01-14 1993-07-30 Nippon Telegr & Teleph Corp <Ntt> Conditional operation control circuit for parallel processing
JPH06244741A (en) * 1993-02-18 1994-09-02 Nec Corp Error correcting method
JPH0963208A (en) * 1995-08-23 1997-03-07 Victor Co Of Japan Ltd Error correction device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03156558A (en) * 1989-11-14 1991-07-04 Nec Home Electron Ltd Method for communication between host cpu and coprocessor
JPH04130910A (en) * 1990-09-21 1992-05-01 Nec Corp Information processor
JPH05189585A (en) * 1992-01-14 1993-07-30 Nippon Telegr & Teleph Corp <Ntt> Conditional operation control circuit for parallel processing
JPH06244741A (en) * 1993-02-18 1994-09-02 Nec Corp Error correcting method
JPH0963208A (en) * 1995-08-23 1997-03-07 Victor Co Of Japan Ltd Error correction device

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6732253B1 (en) 2000-11-13 2004-05-04 Chipwrights Design, Inc. Loop handling for single instruction multiple datapath processor architectures
US6931518B1 (en) 2000-11-28 2005-08-16 Chipwrights Design, Inc. Branching around conditional processing if states of all single instruction multiple datapaths are disabled and the computer program is non-deterministic
US10101978B2 (en) 2002-07-09 2018-10-16 Iii Holdings 2, Llc Statically speculative compilation and execution
US9235393B2 (en) 2002-07-09 2016-01-12 Iii Holdings 2, Llc Statically speculative compilation and execution
US10248395B2 (en) 2003-10-29 2019-04-02 Iii Holdings 2, Llc Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US9569186B2 (en) 2003-10-29 2017-02-14 Iii Holdings 2, Llc Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US9582650B2 (en) 2003-11-17 2017-02-28 Bluerisc, Inc. Security of program executables and microprocessors based on compiler-architecture interaction
US9697000B2 (en) 2004-02-04 2017-07-04 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US9244689B2 (en) 2004-02-04 2016-01-26 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US10268480B2 (en) 2004-02-04 2019-04-23 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US9940445B2 (en) 2006-11-03 2018-04-10 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US9069938B2 (en) 2006-11-03 2015-06-30 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US10430565B2 (en) 2006-11-03 2019-10-01 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US11163857B2 (en) 2006-11-03 2021-11-02 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
JP2013161271A (en) * 2012-02-06 2013-08-19 Ricoh Co Ltd Simd (single instruction-stream multiple data-stream) type microprocessor
CN116909626A (en) * 2023-09-13 2023-10-20 腾讯科技(深圳)有限公司 Data processing method, processor and computer equipment
CN116909626B (en) * 2023-09-13 2023-12-29 腾讯科技(深圳)有限公司 Data processing method, processor and computer equipment

Similar Documents

Publication Publication Date Title
US5691994A (en) Disk drive with fast error correction validation
US8122078B2 (en) Processor with enhanced combined-arithmetic capability
US5689727A (en) Disk drive with pipelined embedded ECC/EDC controller which provides parallel operand fetching and instruction execution
US5640286A (en) Disk drive with error code embedded sector identification
US7600177B2 (en) Delta syndrome based iterative Reed-Solomon product code decoder
US7376812B1 (en) Vector co-processor for configurable and extensible processor architecture
US5812564A (en) Disk drive with embedded finite field processor for error correction
JP4295758B2 (en) Error correction apparatus, optical disk control apparatus, optical disk reading apparatus, and error correction method
US6151669A (en) Methods and apparatus for efficient control of floating-point status register
EP0329789B1 (en) Galois field arithmetic unit
US7962833B2 (en) Unified memory architecture for recording applications
US20070186085A1 (en) Method, medium, and apparatus with interrupt handling in a reconfigurable array
WO1999014685A1 (en) Data processor and data processing system
TW476883B (en) Data processing device
US20050172210A1 (en) Add-compare-select accelerator using pre-compare-select-add operation
JP2001027945A (en) Floating point unit using standard mac unit for executing simd operation
CN1320450C (en) Method for providing width-variable at least six-path addition instruction and apparatus thereof
US6243845B1 (en) Code error correcting and detecting apparatus
JP2000172520A (en) Galois field operation processor
JP2000259579A (en) Semiconductor integrated circuit
JPH08123682A (en) Digital signal processor
US20020116599A1 (en) Data processing apparatus
US6212539B1 (en) Methods and apparatus for handling and storing bi-endian words in a floating-point processor
US7234044B1 (en) Processor registers having state information
EP1220092B1 (en) System and method for executing variable latency load operations in a data processor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR SG US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: KR

122 Ep: pct application non-entry in european phase