WO1999014685A1 - Machine de traitement des donnees et systeme de traitement des donnees - Google Patents

Machine de traitement des donnees et systeme de traitement des donnees Download PDF

Info

Publication number
WO1999014685A1
WO1999014685A1 PCT/JP1997/003259 JP9703259W WO9914685A1 WO 1999014685 A1 WO1999014685 A1 WO 1999014685A1 JP 9703259 W JP9703259 W JP 9703259W WO 9914685 A1 WO9914685 A1 WO 9914685A1
Authority
WO
WIPO (PCT)
Prior art keywords
data processing
control unit
unit
instruction
data
Prior art date
Application number
PCT/JP1997/003259
Other languages
English (en)
Japanese (ja)
Inventor
Hirotsugu Kojima
Kenji Kaneko
Toshimitsu Ozawa
Tsukasa Yamauchi
Yukari Katayama
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/JP1997/003259 priority Critical patent/WO1999014685A1/fr
Publication of WO1999014685A1 publication Critical patent/WO1999014685A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors

Definitions

  • the present invention relates to a SIMD (Single Instruction Multiple Data) type data processing device capable of performing parallel arithmetic processing, and is applied to, for example, error correction of codes, and further to encoding and decoding of data in storage systems and communication systems. Effective technology. Background art
  • Hard disk drives, CD-ROMs (Compact Disc-Read Only Memory), DVDs (Digital Video Discs), magneto-optical disks, and other recording media can be used to correct recording / reading errors that occur on media.
  • Code words are used.
  • a codeword is defined, for example, by a special set of numbers called the Galois field and special operations defined with it. Code error correction is performed by data processing using the number of Galois fields and arithmetic.
  • a set of numbers in the Galois field can be defined in multiple ways by a primitive polynomial called a primitive polynomial, which defines the set of numbers. At the same time, the operations are defined differently depending on the primitive polynomial.
  • Reed Solomon code which is used particularly for error correction codes in data storage systems and communication systems where errors tend to concentrate locally.
  • Reed-Solomon codes are defined using the number of Galois fields, and encoding and decoding are performed by operations on the Galois field.
  • the primitive polynomial that defines the Galois field is defined for each medium. Since the Galois field and the four arithmetic operations of the number of the Galois field are well known, A detailed description will be omitted.
  • the time allowed for processing of received data including error correction is limited to the data transmission time. It is determined from the reception speed. This is to take into account the real-time processing.
  • the error correction of the code requires a hardwired custom LSI (semiconductor integrated circuit) instead of a general-purpose processor in consideration of the specialty of the four arithmetic operations on the number of Galois fields and real-time performance. Has been used.
  • the arithmetic unit on the Galois field is composed of hardware, and as an LSI that performs error correction in the form of a processor, “Error Correction LSI for Optical Disks” (Transactions of the Institute of Electronics, Information and Communication Engineers A Vol. J73-A NO.2 pp .261-268, February 1990) is known (first known example). According to this, the error correction processing procedure is divided into the following four steps. In step 1, a syndrome polynomial is calculated from the received codeword. In step 2, the error locator polynomial and the error value polynomial are determined from the syndrome polynomial. In step 3, the error location is determined from the error location polynomial.
  • step 4 an error value is obtained from the error locator polynomial and the error value polynomial.
  • the error value polynomial In the above-mentioned known example, it is difficult to perform all of the above steps using hardware in the form of a processor due to restrictions on real-time properties. Processing is performed by the dedicated circuit of the wire.
  • Measures to improve the data processing performance to satisfy the real-time property include a method of increasing the operating frequency of the processor and a method of improving the operation performance by introducing parallel processing.
  • the former operation frequency can be improved by improving the process / circuit technology or adopting multi-stage pipeline processing.
  • process / circuit technology The improvement in performance does not mean that a performance improvement several times at a time can be achieved.
  • multi-stage pipeline processing is introduced into hardware of the processor system, there is a problem that a large amount of overhead is generated in branch processing.
  • the introduction of the parallel processing can relatively easily improve the calculation performance, but has a problem that the processing performance cannot be substantially improved unless the processing algorithm itself is adapted to the parallel processing.
  • a factor that disrupts the parallelism of the processing algorithm is branch processing that depends on the operation result.
  • a “conditional operation control circuit in parallel processing” Japanese Patent Application Laid-Open No. 5-189585)” is known.
  • This is a SIMD-type parallel processor consisting of one instruction supply circuit and a plurality of operation units of the same configuration that execute the same instruction on different data.
  • a flag control circuit is provided for each operation unit. .
  • the flag control circuit inputs an operation result flag indicating the operation result from the corresponding operation unit and outputs an operation condition flag to the corresponding operation unit. According to the operation condition flag, the operation unit performs conditional operation such as updating / holding of the output register.
  • the operation result flag is stored in the shift register provided in the flag control circuit for a plurality of cycles, and the operation result can be reflected in the operation condition flag during that period. . If this technology is used, in the SIMD type parallel processor, each arithmetic unit can realize branch processing depending on the arithmetic result.
  • CD-ROMs are provided with devices whose access speed has been increased from the standard speed to 2x speed, 4x speed, and even 12x speed or more. It is essential to improve the processing speed of the LSI that performs error correction. The demand for higher speed is so urgent that it cannot be met by improving the performance through microfabrication of semiconductors, so an architectural level redesign is required. This requires a huge amount of development man-hours, leading to the problem of increasing development costs.
  • the processing unit (PE0) processes the former as shown in FIG. 25 (a).
  • the latter operation unit (PE1) holds the output register.
  • the two arithmetic units start the processing C in parallel.
  • the condition is not satisfied in both operation units (PE0, PE1), neither operation unit may execute process B. Since the instruction is described by a conditional operation instruction, cycles in which no operation is performed are repeated by the number of execution cycles of the processing B. The same applies to the case where the number of operation units is two or more. If the conditions are not satisfied in all the operation units, the number of cycles of the processing B is wasted. As described above, there is a problem that even if the parallel processing technology disclosed in the second known example is used, the processing performance cannot be sufficiently improved.
  • An object of the present invention is to provide a data processing device capable of improving parallel processing performance in the SIMD format.
  • Another object of the present invention is to improve the error correction processing speed for a codeword. To provide a data processing device capable of performing such operations.
  • Still another object of the present invention is to provide a data processing apparatus which can cope with a wide range of error correction codes and can cope with an improvement in performance by a simple design change.
  • Another object of the present invention is to provide a data processing system capable of accelerating the data reading speed in a storage system and supporting high-speed data transmission in a communication system from the viewpoint of error correction processing.
  • a SIMD data processing device is configured such that a control unit that decodes and executes a fetched instruction and control information for an arithmetic operation are provided in parallel from the control unit, and data transfer is performed by the control unit. And a plurality of data processing units to be controlled.
  • Each of the data processing units includes a standby control unit that sets the data processing unit to a standby state in accordance with a result of an arithmetic operation according to the control information, and activates each data processing unit from the standby state. The control to return to the state is performed by the control unit.
  • the plurality of data processing units are brought into the standby state based on the results of the individual calculations, and the control unit returns from the standby state to the active state, thereby realizing the branch processing.
  • the control unit includes: a detection unit configured to detect whether each data processing unit is in a standby state; and a data processing unit according to a detection result by the detection unit.
  • the control unit monitors and controls the operation state of each data processing unit so that all the data processing units do not enter the standby state and generate a useless cycle. That is, when all the data processing units are in the standby state, the logic unit changes the order of instruction execution by the control unit and returns the data processing unit in the standby state from the standby state to the active state. .
  • the standby control unit includes a determination unit that determines whether a result of an arithmetic operation performed by the data processing unit has reached a specific state, and is set to a set state in synchronization with the detection of the specific state by the determination unit.
  • a standby register that is reset by specific control information from the control unit; and a unit that stops an arithmetic operation by the data processing unit in response to the set state of the standby register. be able to.
  • the means for stopping the arithmetic operation may be configured to selectively inhibit transmission of control information supplied from the control unit to an internal circuit.
  • the data processing unit includes a Galois field multiplication circuit and an addition circuit.
  • the control unit includes a Galois field multiplication instruction and a Galois field addition as operation instructions for controlling the Galois field multiplication circuit and the addition circuit.
  • the data processing device can be configured as a semiconductor integrated circuit by executing at least the instruction and the Galois volume sum operation instruction.
  • the control unit fetches an instruction from the program memory, and executes the data processing unit.
  • Error correction processing The introduction of a SIMD parallel processor to perform code error correction processing increases the number of data processing units without changing the basic architecture or programs to improve the processing speed required by media. By simply increasing the degree of parallelism, you can easily respond. Codes of different standards can be dealt with by changing the program, and systems that assume error correction of multiple standards can be easily handled.
  • a SIMD data processing device that decodes and executes a fetched instruction, and that the control unit receives control information for an arithmetic operation in parallel from the control unit, A plurality of data processing units controlled to transfer data by the control unit, and storage means accessed by the control unit, wherein each of the data processing units includes a first arithmetic circuit, Buffer means connected to one arithmetic circuit, and a plurality of bus means for designating the address of the buffer means so as to be changeable, wherein the buffer means of each data processing unit is connected via a data bus. It is connected to the storage means.
  • the buffer means provided for each data processing unit is used as a work area for temporarily storing data during the operation and the like. Therefore, each buffer unit operated in parallel can suppress overhead caused by data transfer as compared with the case where the storage unit is also used as a work area, and improve parallel processing performance. Can be.
  • the control unit should reduce the number of bits of the operand designation field included in the instruction description for controlling access to the buffer means. Can be. For example, when specifying an addition operation that substitutes A + B for C in one instruction, the source address of A and B and the destination address of C must be described in the operand specification field. No. At this time, if the identification information of the above-mentioned buffer means is described and used in the operand specification field, the number of bits of the operand specification field can be reduced as compared with the case where the address of the buffer means is directly described in the instruction. It can be reduced. This can contribute to a reduction in instruction word length when a high-performance operation is defined by one instruction.
  • the setting of the address information for the above-mentioned boyfriend means may be performed by the control unit executing a click instruction or the like. Further, each of the data processing units may further include a second calculation unit used for updating address information set in the binding unit by the control unit. As a result, the number of data transfers from the control unit to the data processing unit can be reduced.
  • the control unit includes an instruction execution unit that executes an operation instruction that specifies a parallel operation in the data processing unit and a data transfer instruction that specifies a data transfer to the data processing unit.
  • the instruction execution means can execute the operation instruction and the data transfer instruction in parallel. That is, the data processing device supports a compound instruction in which an operation instruction and a data transfer instruction are combined. In the parallel processing of the SIMD format, it is supposed that the data transfer capability may be insufficient compared to the computing capability, so that it is possible to deal with this.
  • the instruction execution means includes a single instruction included in the operation instruction, and a buffer instruction designated by the pointer means.
  • the data obtained from the above is calculated, the calculation result is stored in the buffer means specified by the different boy means, and an instruction for updating the contents of the above boy data is given by the calculation.
  • the instruction execution means further executes an instruction for operating data inside the control unit, and a branch instruction for branching an instruction fetched by the control unit. Can be made possible.
  • a SIMD data processing device focuses on an instruction executed by the control unit for the above-described standby state control. That is, the data processing device includes the control unit described above and a plurality of data processing units, each of the data processing units includes the standby control unit, and the control unit includes a data processing unit. There is means for referring to whether or not the data processing unit is in the standby state, and the data processing unit is returned from the standby state to the active state according to the reference result.
  • the control unit sets a condition for setting the data processing unit in a standby state in the data processing unit, and executes an instruction for setting the data processing unit in which the set condition is satisfied to a standby state when set. can do.
  • the control unit sets a condition for setting the data processing unit in a standby state in the data processing unit, and waits for the data processing unit in which the set condition is satisfied in an instruction execution cycle after the condition setting. Instructions can be executed to bring the state. Further, the control unit may execute a command for giving an instruction to individually set the plurality of data processing units to a standby state or to return to the active state from the standby state.
  • a SIMD data processing device is intended to increase the efficiency of repetitive processing by parallel arithmetic processing. That is, the data processing apparatus has the above-described control unit, a plurality of data processing units, and storage means accessed by the control unit, and each of the data processing units stores the control information And a standby control unit that sets the data processing unit to a standby state according to a result of an arithmetic operation according to the following.
  • the standby control unit changes the inside of the data processing unit from a standby state to an active state according to an instruction from the control unit. To return to At this time, the control unit operates when the respective data processing units are in a standby state.
  • Detecting means for detecting whether or not the data processing unit is present, and logic means for returning the data processing unit from a standby state to an active state in accordance with a detection result by the detecting means.
  • the data processing unit is operated in parallel according to the instructions from the start address to the end address and up to the set number of repetitions. Let me do it.
  • the rebeat command may be a command that specifies the start address of the repeat loop, the end address of the repeat loop, the number of times the repeat loop is repeated, and the condition for forcibly terminating the repeat loop.
  • the control unit when executing the repeat instruction, repeatedly operates the data processing unit in parallel according to the instruction from the start address to the end address unless the forced termination condition is satisfied.
  • the standby state of all data processing units can be set as the condition for forcibly terminating the repetitive loop.
  • a standby state of at least one data processing unit can be set as a condition for forcibly terminating the repetition loop.
  • the data processing unit includes a Galois field multiplication circuit and an addition circuit
  • the control unit includes a Galois field multiplication circuit and an addition circuit for controlling the Galois field multiplication circuit and the addition circuit. It is possible to cause at least a Galois field multiplication instruction, a Galois field addition instruction, and a Galois volume sum operation instruction to be executed.
  • the control unit fetches an instruction from the program memory and uses the data processing unit. It can be configured to perform error correction processing.
  • the syndrome calculation processing of the code decoder defined by the number of Galois fields, the determination processing of the presence or absence of an error using the syndrome obtained by the syndrome calculation processing, and the syndrome in which the error is determined are performed.
  • the process of storing the ROHM in the storage unit may be repeated a plurality of times, and thereafter, the stored syndrome may be read from the storage unit to perform an error correction operation process.
  • the syndrome operation, the determination of the presence or absence of an error, and the error correction operation are performed as a single iteration loop and the parallel operation is performed, if there is no error in some data processing units, the data The overnight processing unit must maintain a standby state until one loop processing is completed, and it is expected that useless cycles will frequently occur.
  • the processing procedure of the present invention when an error has occurred, the syndrome is stored, and when the stored syndrome has accumulated to a certain extent, the correction operation processing is collectively performed on a plurality of syndromes. Do. Therefore, the standby state of the data processing unit can be shortened as a whole, and unnecessary cycles due to the standby state can be reduced.
  • the number of processes for storing the syndrome for which an error has been determined increases, and conversely the amount of processing may increase. That is, whether or not the processing for storing the syndrome is executed is determined by whether or not an error has occurred as described above. Therefore, if the number of errors increases, the number of syndrome storage processes increases, and the processing amount may increase.
  • a data processing system to which the data processing device is applied includes, for example, an input unit for code data defined using the number of Galois fields, the data processing device, and a data output unit.
  • the device corrects the error of the code data input from the input means, based on a program stored in the program memory. This enables error correction From the viewpoint of processing, etc., it is possible to respond to high data reading speed in the storage system and high speed data transmission in the communication system.
  • the control unit included in the data processing device can execute the input / output control by the input unit and the output unit and the error correction process of the code data in a time sharing manner.
  • FIG. 1 is a block diagram of a SIMD parallel processor according to a first embodiment of the present invention
  • FIG. 2 is a block diagram of a SIMD parallel processor according to a second embodiment of the present invention.
  • FIG. 3 is a block diagram of a SIMD parallel processor according to a third embodiment of the present invention.
  • FIG. 4 is a block diagram of a SIMD parallel processor according to a fourth embodiment of the present invention.
  • FIG. 5 is a flowchart showing a typical processing procedure for error correction of a lead Solomon code
  • FIG. 6 is a flowchart showing an example of a processing procedure for performing error correction of a code by the SIMD parallel processor of the present invention
  • FIG. 7 is a flowchart showing an example of a more efficient processing procedure for performing error correction of a code by the SIMD parallel processor of the present invention.
  • FIG. 8 is a logic circuit diagram showing an example of a condition determining unit
  • FIG. 9 is a logic circuit diagram showing an example of a comparator with a mask.
  • FIG. 10 is a block diagram showing an example of a data processing unit
  • FIG. 11 is a block diagram showing an example of a data processing unit in which the supply of a clock signal is stopped in the data processing unit of FIG. 10 to realize a standby state.
  • FIG. 12 is a block diagram showing a detailed example of a data processing unit suitable for error correction processing,
  • FIG. 13 is an explanatory diagram of a special control instruction of the data processing device according to the present invention
  • FIG. 14 is an explanatory diagram of a data transfer instruction of the data processing device of the present invention
  • FIG. FIG. 16 is an explanatory diagram of a SIMD instruction of such a data processing device.
  • FIG. 16 is an explanatory diagram showing the configuration of all instruction codes of a general RISC instruction and a compound instruction combining a data transfer instruction and a SIMD instruction.
  • FIG. 18 is an explanatory diagram showing an example of an instruction code of a transfer instruction.
  • FIG. 18 is an explanatory diagram showing an example of an instruction code of a SIMD instruction.
  • Fig. 19 is a block diagram of a SIMD-type parallel processor for explaining the configuration for putting the data processing unit into a standby state by using the data setting instruction.
  • Fig. 20 executes "setPENOP2 1 if”. Operation timing chart,
  • FIG. 22 is an explanatory diagram showing an example of a program created using the instruction set shown in FIGS. 13 to 15;
  • Figure 23 is a block diagram of an example of a DVD / CD-ROM system to which a SIMD type parallel processor is applied.
  • FIG. 24 is a flowchart showing general conditional branch processing.
  • FIG. 25 is a flowchart showing an example of the conditional branching process by the SIMD type parallel processor discussed earlier by the present inventors. BEST MODE FOR CARRYING OUT THE INVENTION
  • FIG. 1 is a block diagram of a SIMD parallel processor according to a first embodiment of the present invention.
  • the SIMD parallel processor shown in FIG. First and second memories 801 and 802, a peripheral circuit 900 connected via a bus interface (bus I / F) 901, a controller 200, and a plurality of parallel data processors 101 and 102. , ... 10n.
  • CD B and C AB shown in the figure are a common data bus and a common address bus.
  • L DB and LAB are local data buses and oral address buses.
  • the SIMD type parallel processor shown in the figure can be formed on one semiconductor substrate including all or a part of the peripheral circuit except for the peripheral circuit. Note that the memory, in particular, a part or all of the first memory may not be formed on the single semiconductor substrate.
  • the first memory 801 is used for storing programs and data to be fetched and executed by the control unit 200.
  • the control unit 200 supplies an address to the first memory 801 to read or write an instruction or data.
  • the second memory 802 is used for storing data overnight, and data is input / output to / from the data processing units 101, 102,..., 10n, etc., which are arithmetically controlled by the control unit 200.
  • the target data is supplied to the data processing units 101, 102,... 10n, and is used for storing the results of the calculation.
  • the second memory 802 is also accessible from the control unit 200.
  • the program may be stored in the second memory 802 and transferred to the control unit 200 via the bus. Not only in this embodiment, but in all embodiments disclosed in the present invention, if the first memory 801 and the second memory 802 are allocated to different address areas in the common address space, the program and the It is preferable because data can be accessed in the same manner.
  • the control unit 200 includes a program counter (PC) 202, an instruction decoding unit 201, and a data operation unit 203, and is configured to determine a data operation result by the data operation unit 203 and set a flag 204. I have. Outputs the value of program count 202 as address to first memory 801 To fetch the instruction. The fetched instruction is decoded by the instruction decoding unit 201, and a control signal is issued to the data processing units 101, 102,... 10n according to the processing result. In addition, in response to the control signal output from the instruction decoding unit 201, the data operation unit 203 updates the program counter 202 according to the instruction, performs memory access, and performs data operation. As the address of the second memory 802, the result of the operation by the data operation unit 203 of the control unit 200 is used.
  • PC program counter
  • the data operation unit 203 of the control unit 200 performs program loop control, operation not suitable for parallel operation, memory address operation, and the like.
  • the operation result is fed back to the instruction decoding unit 201 via the flag 204 and the program is executed. The flow is being controlled. Further, the operation progress and results of the data processing units 101, 102,... 10n can be fed back to the instruction decoding unit 201.
  • the data processing units 101, 102,..., 10n all have the same configuration, are supplied with the same control signal from the control unit 200, and perform the same operation in principle. Data is mainly supplied from the second memory 802, but can also be transferred from the first memory 801 and the peripheral circuit 900 via the bus interface circuit 901 and from the data arithmetic circuit 203 of the control unit 200. .
  • data transfer can be performed from the data processing units 101, 102,... 10n to the first memory 801 or the second memory 802, the peripheral circuit 900, and the data operation circuit 203 of the control unit 200.
  • the memory can read 32 bits of data at once.
  • the configuration is such that data of every 8 bits can be supplied collectively to the data processing units 101, 102,... 10 ⁇ .
  • Each of the data processing units 101, 102,..., 10 ⁇ is provided with a data processing unit 101, in accordance with a calculation result by the calculation units 161, 162,... 16 ⁇ according to the control information (control signal / command) from the control unit 200.
  • Standby control to put 102, ... 10 ⁇ in standby state As means, for example, the judgment units lll, 112, ... lln, the standby registers 121, 122, ... 12 ⁇ , the standby / active switching units (enb / dis) 131, 132, ... 13 ⁇ Yes.
  • the standby control unit returns the inside of the data processing units 101, 102,... 10 ⁇ from the standby state to the active state according to an instruction from the control unit 200.
  • the standby control means includes: a determination unit 111 (112,... 11 ⁇ ) for determining whether a calculation result by the data processing units 101, 102,.
  • the standby register is set to a set state (first state) in synchronization with the detection of the specific state by the control unit, and is set to a reset state (second state) by specific control information from the control unit 200.
  • 121 122,... 12 ⁇
  • a standby / active switching unit 131 132, 132 for stopping the arithmetic operation by the data processing unit 101 (102,. ... 13 ⁇ ).
  • the standby registers 121, 122,... 12 ⁇ are independently read from the control unit 200, and can be set / reset.
  • the control section 200 can monitor the status of the registers 121, 122,.
  • FIG. 2 shows an SIMD type parallel processor according to a second embodiment of the present invention. Circuit blocks having the same functions as those shown in FIG. 1 are denoted by the same reference numerals.
  • SI MD type parallel processor While the computing power is enhanced by operating the data processing units in parallel, the problem that the transfer speed of data processing determines the processing performance of the processor often occurs. For example, when the intermediate processing results in the respective data processing units 101, 102,..., 10 ⁇ are stored in the memory 802 each time, the number of parallel data input / output bits of the memory 802 is equal to all the data processing units 101, 102,. If the number of bits that can be used for data input / output in parallel with 102,... 10 ⁇ is not enough, the memory access must be divided into multiple times.
  • the storage means 141, 142 which store a plurality of data (the number of data bits used as a data processing unit) in each of the data processing units 101, 102,. ... 14 ⁇ is provided.
  • the data storage means 141, 142,..., 14 ⁇ are used as so-called peak areas of the individual data processing units, and in ordinary applications, several tens of words are required. Would require 5 to 7 bits of instruction code per code.
  • a plurality of boys 151, 152,... 15 ⁇ for specifying the storage areas of the storage means 141, 142,.
  • the code specified by ..15 ⁇ is used as the operand of the instruction.
  • the instruction code can be specified in 2 bits if the number of buses is up to four.
  • the values of the pointers 151, 152,... 15 ⁇ are designed to be updated at the same time as an operation instruction, a data transfer instruction, and the like. Pointers 151, 152, ... 15 ⁇ Increment / decrement evenings are set for each Add control to instruct update of 151,152, ... 15 ⁇ .
  • a mechanism is provided for detecting when the value of the pointers 151, 152,... 15 ⁇ reaches a predetermined value. Registers that hold the end values of pointers 151, 152, ... 15 ⁇ are provided, and each time the values of pointers 151, 152, ... 15 ⁇ are updated, they are compared with the end values, and the comparison result is reflected in flag 204. I do.
  • FIG. 3 shows a SIMD type parallel processor according to a third embodiment of the present invention.
  • This embodiment is a combination of the first and second embodiments. Circuit blocks having the same functions as those shown in FIGS. 1 and 2 are denoted by the same reference numerals. It is attached.
  • Judgment units 171, 172,... 17 ⁇ in addition to the transition conditions are adopted.
  • This is a memory Means 141, 142,... 14n, and is suitable for performing processing until processing for all data groups in which the total number of data stored therein is different or unknown is completed. . That is, if the number of data sets in the data set is different for each of the data processing units 101, 102,... 10 ⁇ , the data processing units that have completed processing sequentially enter a standby state. The processing can be performed while maintaining.
  • FIG. 4 shows a SIMD parallel processor according to a fourth embodiment, which is a further preferred embodiment of the third embodiment.
  • a condition determining unit 206 for determining the standby state of each of the data processing units 101, 102,... 10 ⁇ is explicitly shown in the control unit 200, and the control unit 200 A repeat control unit 205 that uses the determination result of the unit 206 is provided.
  • the rest of the configuration is the same as the embodiment of FIG.
  • the standby state of each of the data processing units 101, 102,... 10 ⁇ is monitored by the condition determination unit 206 using a signal PEN0PEA.
  • the standby state of each of the data processing units 101, 102,... 10 ⁇ becomes a predetermined state, information indicating the state is reflected on the flag 204, and is given to the repeat control unit 205.
  • the repeat control circuit 205 includes hardware for holding a program repetition start address, a program repetition end address, and the number of repetitions. When a repeat command for giving a value to these is issued, an overflow for repetition is performed. Perform a repeat loop without any
  • the repeat control unit 205 includes a register for storing a start address of a repetitive loop, a register for storing an end address, and a register for storing the number of repetitions.
  • a counting means for counting the number of repetitions is provided, and a repetition loop (repeat loop) for executing a command from the start address to the end address through the program counter 202 is formed. Lipi —The number of repetitions of the loop is defined by the set number of repetitions.
  • the repeat loop is forcibly terminated when the condition is satisfied.
  • the condition for forced termination of the repetitive loop processing it is possible to use the detection result of the condition determination unit 206 as to whether the standby state of the data processing units 101, 102,... 10 ⁇ satisfies a preset condition. it can.
  • the control unit 200 detects it and can forcibly terminate the program branch or the retry loop, so that there is no waste in the processing steps. There is.
  • the data processing apparatus having the configuration disclosed in the first to fourth embodiments includes an error correction code, particularly a Reed-Solomon code, by providing a Galois field arithmetic unit in the data processing units 101, 102,. This is optimal for error correction processing.
  • the error correction processing flow consists of data transfer 1001, syndrome calculation 1002, error determination 1003, Euclidean algorithm 1004, Chien search 1005, error numerical calculation 1006, and correction 1007.
  • the data transfer 1001 is a data transfer from the first memory 801 to the second memory 802.
  • the data is arranged in the second memory 802 in a format suitable for SIMD-type parallel processing.
  • the syndrome calculation 1002 receives a series of received codes (from r O to r 255) 2001 as input and calculates a coefficient 2002 of a syndrome polynomial.
  • the series of received codes (r0 to r255) is, for example, 256 bytes of data, and includes encoded data and parity information corresponding thereto.
  • the syndrome calculation is performed in units of the series of received codes (r0 to r255). If the coefficients 2002 of the syndrome polynomial are all zero, it is understood that there is no error in the received code. If it is found that there is no error, omit the following processing and terminate. If it is found that there is an error, start the correction processing.
  • an error locator polynomial 2003 and an error numerical polynomial 2004 are calculated from the syndrome polynomial 2002 by the Euclidean algorithm 1004.
  • the location of the error 2005 is obtained by obtaining the root of the error locator polynomial 2003 by the Chien search 1005.
  • the error position 2005 is obtained with a value that cannot actually be obtained, it is understood that an error has occurred that exceeds the code correction capability.
  • it outputs that the correction is not possible, and skips the following processing and ends. If the position of the error is properly obtained, the error value 2006 is calculated based on the position, the correction 1007 is performed, and the processing is terminated.
  • the biggest advantage of using a SIMD parallel processor configuration is data processing. By changing the number of copies, the processing performance can be changed without changing the program.
  • the program can be the same as long as the code standard is not changed, and high-speed access can be handled by increasing the number of data processing units, making design changes extremely easy.
  • FIGS. 6 and 7 schematically show a processing flow when the four reception code sequences 2001 are processed in parallel.
  • GPEO (lOl) to GPE3 (104) show four parallel data processing units, and in the vertical direction, each of the data processing units 101, 102, 103, 104 The processing to be performed is shown.
  • FIG. 6 if an error is detected in at least one of the four input received code sequences 2001, the data processing unit in which no error is detected is set to the standby state 1099, and the time until the correction is completed. Execute the process.
  • Fig. 6 shows that the received code sequence 2001 input to GPEO (lOl) has an error within the correctable range, and the received code sequence 2001 input to GPE 102) has an error beyond the correctable range. Is generated, and no error occurs in the received code sequence 2001 input to the GPE2 (103) and the GPE3U04). All data processing units 101, 102, 103, 104 are unconditionally operated in parallel until data transfer 1001, syndrome calculation 1002, and error determination 1003.
  • GPE0 (101) and GPE102 which were found to have errors, perform subsequent error correction processing, while GPE2 (103) and GPE3 (104) detect that no errors were found.
  • GPE2 (103) and GPE3U04) will be in the waiting state 1099 during that time.
  • GPE1 (102) performed the following error numerical calculation 1006 and correction 1007. Wait for the GPE0 (101) to end in the standby state 1099. In this case, if all the data processing units 101, 102, 103, and 104 detect that there is no error, the processing can proceed to the next series of received codewords 2001 without performing error correction processing.
  • Wasteful processing steps can be avoided. However, if at least one of the four received code sequences 2001 simultaneously processed by the four data processing units 101, 102, 103, and 104 has an error, one data for which error correction is actually performed is performed.
  • the three data processing units other than the overnight processing unit enter the standby state 1099 in all the processing after the Euclidean algorithm 1004 and wait for the completion of the error correction processing. In this case, it cannot be said that the data processing units 101, 102, 103 and 104 provided in parallel are effectively used. As the number of data processing units increases, the probability that no error is detected in all data processing units decreases, and processing efficiency decreases.
  • the processing flow shown in FIG. 7 is effective.
  • the four received code sequences 2001 are read into the four data processing units 101, 102, 103, and 104 (1001), and the syndrome calculation 1002 and the error determination 1003 are performed.
  • the processing flow shown in Fig. 6 In GPEO (lOl) and GPE1 U02) found to be incorrect, the calculated syndrome 2002 is stored in the memory 802, and GPE2 (103) and GPE3 (104) found to be free of errors are in the standby state. It becomes 1099 and waits for the end of the processing.
  • the processing flow returns to the first data transfer, and the next four received code sequences 2001 are read into the four data processing units 101, 102, 103, and 104. Further, the syndrome calculation 1002 and the determination of the presence or absence of an error 1003 Similarly, only the syndrome 2002 of the received code sequence 2001 in which an error is detected is stored 1008 in the memory 802. After this is performed for a certain number of received code sequences 2001, the syndrome 2002 stored in the memory 802 is read only for the received code sequence 2001 in which an error is detected (1009). , Perform the following error correction processing. Again, the number of errors is If the error exceeds the correction capability, it is impossible to correct the error. In the example shown in FIG.
  • the process of temporarily storing the syndrome 2002 of the received code sequence 2001 in which the error was detected in the memory 802 (1008) and reading it out again (1009) is the same as the example shown in FIG. It will be needed more than in.
  • the processing flow shown in Fig. 6 is efficient when the frequency of errors is high
  • the processing flow shown in Fig. 7 is efficient when the frequency of errors is low. From the relationship between the frequency of occurrence of errors and the number of processing steps required for error correction and the number of processing steps required to temporarily store the syndrome in memory, it is possible to quantitatively determine which processing flow is more efficient.
  • the average value of the frequency of occurrence of errors is extremely lower than the frequency of occurrence of the most supposed errors. If the received code is 256 bytes and eight or less errors are corrected, 16 syndromes are required. Since the probability of an error occurring is typically 1 in 1000, an average of about four codewords will have one word error.
  • the number of processing steps when error correction is performed according to the processing flow in FIG. 7 is estimated.
  • the processing is completed in less than half of the processing flow shown in FIG. If the error rate is lower, or if the parallelism is increased by increasing the number of processing units, the effect of reducing the number of processing steps is more remarkable when the processing flow in Fig. 7 is adopted. become.
  • a program for implementing the processing procedure of FIG. 7 is stored in the memory 801.
  • the control program for the S I MD processor can also be stored in external memory.
  • the control of the repetition loop can be applied to the processing as shown in FIG. That is, the process from the overnight transfer (1001) to the storage of the syndrome (1008) is repeated a plurality of times, and thereafter, the process branches to the reading process of the syndrome (1009).
  • Each processing in the error correction processing flow shown in FIG. 5 includes many repetitive loops as in general signal processing.
  • Repeat loops are classified into two types from the viewpoint of controlling hardware.
  • One is an iterative loop that is configured with software, using the general-purpose register provided in the data processing unit of the control unit as a counter.
  • the other type is a register that stores the start address in the control unit, and stores the end address.
  • This is a repetition loop in which a control evening is set up, and a countdown is provided to count the number of repetitions.
  • This hardware is the repeat control unit 205.
  • An iterative loop composed of software has no restriction on the nest depth of the loop, and in particular does not have a repeat control unit 205, so that the circuit scale can be saved, but it is necessary to issue control instructions. The number of processing cycles increases.
  • the repetition loop control by the repeat control unit 205 does not require the number of processing cycles for control because the control of the number of repetitions and the like are all performed by hardware, but there are restrictions such as limitation of the loop nest. . Generally, when there are multiple nests, the innermost loop is composed of a repeat loop.
  • the repetition loop is classified into three types from the viewpoint of the processing flow to be realized.
  • the number of repetitions is a fixed value, the number of repetitions is already known in the processing up to that point, and the one that satisfies a certain condition during repetition, suspends repetition, performs another processing, and then resumes.
  • the control method of the iterative loop described above can be applied to any of the three types of processing flows when the data path is single. However, it is not always easily applicable to general SIMD-type parallel processors.
  • the first iteration loop with a fixed number of iterations can be realized without any problem by a conventional SIMD-type parallel processor.
  • each of the data processing units 101, 102,... 10n independently detects that the number of repetitions determined in the middle of the repetition has ended, and enters a standby state. Since the data processing units 101, 102,... 10 ⁇ sequentially enter the standby state in ascending order of the required number of repetitions, the control unit 200 may detect that all of the data processing units 101, 102,... 10n have entered the standby state and terminate the loop repeatedly.
  • the repetition loop that satisfies a condition during the third repetition, temporarily suspends repetition, performs another process, and then resumes processing, is a timing that satisfies the condition for each of the data processing units 101, 102, ... 10n May be different, which is difficult to achieve with conventional SIMD type parallel processors.
  • the data processing units 101, 102,..., 10n satisfying the conditions individually enter a standby state.
  • a repeat loop is often used as the innermost loop, so adding extra cycles can have a fatal effect on overall performance.
  • the control unit 200 detects that at least one of the data processing units is in the standby state, temporarily suspends the loop repeatedly, performs another process, and then resumes the loop. While another process is being performed, the data processing unit that satisfies the condition is returned from the standby state, and the data processing unit that does not satisfy the condition is set in the standby state by a signal from the control unit 200.
  • the control unit 200 may perform processing independently using an internal data calculation unit.
  • the control unit 200 When controlling the above-described repetition loop in the circuit configuration of the embodiment shown in FIGS. 1 to 3, the control unit 200 does not include the repeat control means, so the data processing units 101, 102,.
  • the standby / active state is performed by transferring the contents of the standby registers 121, 122,... 12 ⁇ provided in the individual data processing units 101, 102,.
  • the contents of the standby registers 121, 122, ... 12 ⁇ are transferred to the data processing unit 203 of the control unit 200, and all the data processing units 101, 102, ... 10 ⁇ are in the standby state or at least 1
  • the above-described repetition loop is controlled based on a state such as that the data processing units are in a standby state. Transfer of contents of 121,122, ...
  • the contents of the standby registers 121, 122,... 12 ⁇ provided in the individual data processing units 101, 102,... 10 ⁇ are transmitted to the control unit 200 via a dedicated signal line PEN0PEA. If there are ⁇ data processing units 101, 102, ... 10 ⁇ , the signal line PEN0PEA has ⁇ bits.
  • the control unit 200 includes a condition determination unit 206 that detects that the standby / active state of the data processing units 101, 102, ..., 10 ⁇ satisfies a preset condition, and a determination result by the flag is stored in a flag inside the control unit 200.
  • the control signal is fed back to 204 or a control signal for temporarily stopping the repeat loop or forcibly terminating the repeat loop to the repeat control unit 205.
  • Feedback to flag 204 is effective when the repetitive loop is controlled by software, and controls the loop such as continuing, suspending, or terminating by checking the status of flag 204. Can be.
  • the repeat control circuit 205 Since the repeat control circuit 205 is a hardware control circuit originally provided to save an instruction execution cycle for controlling the loop, the repeat control circuit 205 monitors the standby / active state of the data processing units 101, 102,... 10 ⁇ . Therefore, it is thought that there are few uses that consume a lot of instruction execution cycles. In many cases, a control signal is sent to the repeat control unit 205 when a preset condition is satisfied, and the repeat loop is forcibly terminated.
  • FIG. 8 shows an embodiment of the condition determining unit 206.
  • a signal ⁇ 0 ⁇ [3: 0] representing the standby / active state is input from each data processing unit.
  • the signal In PENOPEA [3: 0] one data processing unit outputs 1 bit, for a total of 4 bits. This signal is "1" when the corresponding data processing unit is in the standby state.
  • the standby register is 1 bit
  • a plurality of bits may be provided.
  • the data processing unit that corrects the received codeword for which no error was detected sets the upper bit to 1 and enters the standby state.
  • more efficient control is possible by using the lower 1 bit in the standby state to terminate the repetition loop earlier. Become.
  • FIG. 10 shows an example of the data processing units 101, 102,... 10n.
  • the integer register 70, buffer 50, and register 60 are connected to the bus LD #. Data transfer between the memories 801 and 802, the control unit 200, and the peripheral circuit 900 is performed through this bus LD #.
  • the buffer 50 is an example of the storage means 141, 142,... 14 ⁇ , and stores data of several tens of words, of which several words are designated in parallel by the pointer 20, and the operation data is stored. Used for overnight transfer.
  • Pointer 20 is pointer 151, 152,... 15n are examples.
  • An independent register 60 is also used for the transfer of the calculation data.
  • the buffer 50, the register 60, and the arithmetic circuit 10 are connected via an internal bus, and data transfer between the buffer 50 and the register 60 is also performed. It is executed via this internal bus.
  • the arithmetic circuit 10 includes a plurality of arithmetic units, and reflects the arithmetic result on the flag register 40.
  • the value of the pointer 20 gives the address of the buffer 50, and the value can be updated in parallel with the operation and transfer processing by the operation instruction and the data transfer instruction. In the error correction, when the coefficients of the polynomial are stored in the buffer 50 and the operations of the polynomials are performed, it is efficient to perform the operations while sequentially updating the pointer 20.
  • the integer register 70 is connected to the integer arithmetic unit 80 via another internal bus and performs an address operation.
  • the integer register 70 does not need to perform only the address operation of the buffer 50, but may perform a completely different integer operation.
  • the error correction program counts the number of syndromes with a value of zero, and is used to detect that all the syndrome values are zero, that is, there is no error in the received codeword being processed.
  • the operation result of the integer operation unit 80 is also reflected in the flag register 40.
  • the integer operation unit 80 is also included in the operation units 161, 162,... 16 ⁇ of FIG. Flag Regis Evening 40 It is compared with the standby condition supplied from the control unit 200, and when the condition is satisfied, the standby register 42 is set to enter the standby state.
  • the register 42 corresponds to the standby registers 121, 122,... 12n.
  • the circuit 43 for disabling the control signal when in the standby state controls the input control signal to an inactive state so that operations such as arithmetic are not performed.
  • the standby register 42 can be changed by a standby state change signal from the control unit 200.
  • the circuit 43 corresponds to the circuits 131, 132,..., 13 ⁇ in FIG. In FIG. 11, a flag 40 and a comparator 41 are included in the determination units 171, 172,... 17 ⁇ in FIG.
  • FIG. 11 shows a configuration for stopping the clock signal. In the standby state, a clock signal is supplied to the standby register 42 so that a reset operation can be performed, and the supply of the clock signal to other circuits is suppressed by the circuit 43.
  • the arithmetic circuit 10 is suitable for image processing and the like if it has a fixed-point arithmetic unit, and is suitable for computer graphics and the like if it has a floating-point arithmetic unit.
  • FIG. 12 shows a detailed example of a data processing unit suitable for error correction processing.
  • the data processing unit shown in the figure is composed of two Galois field multipliers 11, one Galois field adder 12, 6 4 words Galois buffer 50, 4 words Galois register 60, and 3 pointers ( PS1, PS2, PD) 21,22,23,8 ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 70 70 70 70 ⁇ ⁇ ⁇ ⁇ 70 ⁇ 70 70 ⁇ 70 70 ⁇ 70 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ 70 ⁇ FORM.
  • Galois buffer 50, Galois register 60, two Galois field multipliers 11, and one Galois field adder 12 are connected by six internal buses, and are stored in Galois buffer 50 and Galois register 60. The calculation is performed overnight, and the result is stored in the Galois buffer 50 and the Galois register 60 again.
  • the selectors 14 are provided at the inputs of the calculators 11 and 12, and the internal bus from which the data used for the calculation is output can be selected.
  • the outputs of the arithmetic units 11 and 12 are stored in the Galois buffer 50 or Galois register 60 via the internal bus.
  • the Galois buffer 50 simultaneously outputs the address of the address specified by the pointers (PS1) 21 and (PS2) 22 to the internal bus, and simultaneously outputs the address specified by the pointer (PD) 23 to the internal bus.
  • Import data from The values of pointers 21, 22, and 23 can be increased or decreased (+ 1 / -1) in the same cycle as Galois field arithmetic.
  • the pointer value is written and read from the integer register 70.
  • the integer register 70 stores data required to calculate the values required for controlling the pointers 21, 22, and 23, and the calculation is performed using the connected integer adder / subtractor 80. Will be
  • a predetermined value is stored in advance in the register (PEND) 31, and a pointer to be monitored is designated by the control signal SELPEND to perform monitoring. It is determined by the comparator (CMP) 33 whether or not the value of the present evening coincides with the value preset in the resist evening (PEND) 31. When the judgment results match, the RPTEND flag of the flag register 40 is set by the signal RPTEND, and the standby register 42 is set. When not monitoring the boys 21, 22, and 23, use the control signal SELPEND to set not to compare the values of pointers 21, 22, and 23 with the value of the register (PEND) 31. .
  • the flag register 40 separately provided in the data processing unit has a GZ flag indicating that the result of the Galois field adder 12 has become zero, and the result of the integer arithmetic unit 80 has become zero.
  • GZ flag indicating that the result of the Galois field adder 12 has become zero
  • INEG flag indicates that the condition has become negative.
  • the content of the flag register 40 is compared with the signal N0PCNDX by the comparator 41 with a mask using the signal CNDXMASK as a mask, and if they match, the standby register 42 is set.
  • the standby register 42 is, for example, a 1-bit register, and can be set by the operation result in the data processing unit by the above two methods, or can be reset by writing a value directly from the outside with the signal PEN0PIN. Also, it can be read directly to the outside as signal PEN0PEA.
  • the control signal input to the data processing unit is controlled by the circuit 43 so that all signals except those for controlling access to the standby register 42 are invalidated. ing.
  • FIG. 9 shows an example of the comparator 41 with the mask. 4-bit The flag is compared with the preset 4-bit condition signal N0PCNDX for each bit, but the bits for which '1' is specified in the signal CNDXMASK are excluded from the comparison.
  • the signals NOPCNDX and CNDXMASK are signals that are commonly supplied from the control unit 200 to each data processing unit.
  • FIGS. 13, 14 and 15 show examples of instructions executed by the SIMD type parallel processor of FIG. 4 having the data processing unit shown in FIG. 12. .
  • a reduced instruction set computer (RISC) instruction is preferable in terms of shortening the instruction word length or increasing the code efficiency.
  • the R ID SC architecture is adopted for the SIMD type parallel processor of this embodiment.
  • the instruction shown in FIG. 13 is an example of an instruction which is particularly suitable for implementing the present invention, and is an instruction newly added to a general RISC instruction. In addition to general RISC instructions, it has instructions that can describe RISC data transfer instructions and SIMD instructions in parallel.
  • the data transfer instruction shown in Fig. 14 and the SIMD instruction shown in Fig. 15 can be combined and described in parallel without any restrictions, and can be executed in parallel.
  • the RISC instruction has a new setting instruction and a repeat instruction.
  • the data setting instruction sets the condition data for monitoring the status of the flag (the flag in the flag register 40) and the status data for changing the standby / active status of the data processing unit.
  • the processing section is set to a standby state.
  • means an active state. When a certain process is completed and all the data processing units are in the standby state, all the data processing units are used to proceed to the next process or when a predetermined part of the data processing units is returned to the active state.
  • the control unit 200 has two registers (N0PCNDX) 207 and (CNDXMASK) 208 in which signals NOPCNDX and CNDX ASK are set. These registers 207 and 208 can be written with instructions. The outputs of the register registers 207 and 208 can be supplied to the result decision circuits 171 172,... 17 ⁇ of the respective data processing sections 101, 102,. In the result judgment circuits 171, 172,...
  • the operation result flag is compared with the signal NOPCNDX only for bits not masked by the signal CNDXMASK, and if they match, the corresponding data processing unit is put into a standby state.
  • the value of the flag is determined, and it is determined whether or not the value matches the given condition. If they match, write PEN0P20 to stop the clock. The flag value is not determined without this instruction.
  • the repeat instruction shown in Fig. 13 includes a normal repeat instruction and a repeat instruction with a forced termination condition.
  • Normal repeat instruction “REPEAT RS, RE, RCj is an instruction that repeats the instruction from address RS to address RE RC times, and hardware control is performed by the beat control circuit 205.
  • the repeat instruction with forced termination condition "REPEAT RS, RE, RC, and until condition" does not require an overhead cycle, and the instruction from address RS to address RE is repeated RC times as above. However, if the condition is satisfied, the repeat loop is forcibly terminated.
  • the condition for forced termination is that all unmasked data processing units enter the standby state.
  • PENOPAND s or at least one unmasked data processing unit can be set to the standby state (PEN0P0R). It is effective to use PEN0PAND as the forced termination condition when the number of repetitions differs for each data processing unit, and to use PEN0P0R as the forced termination condition when the search is repeated until a certain data is found. It is.
  • FIG. 14 shows an example of a data transfer instruction.
  • the general-purpose register provided in the data calculation unit 203 of the control unit 200 is used as an address pointer and an address index unit.Three types of mouth instructions, three types of store instructions, and the operation NOP Can be specified.
  • the mouth instruction is a data transfer instruction from the second memory 802 to the data processing unit 101, 102,... 10 ⁇ register 60, while the store instruction is a data transfer instruction from the data processing unit register 60. This is an overnight transfer to the second memory 802.
  • the contents of the register used as the address pointer of the control unit 200 are used as the address of the memory. In case of @A, addless boyne is not updated. In the case of @ A +, the value of the address pointer is incremented by 1 after data transfer.
  • the address value is increased by the addressless index value.
  • 60, Galois buffer 50 can be selected. When the Galois buffer 50 is selected, buffer buses 21 to 23 are held or +1. Since the operation of the address pointer of the memory on the control unit 200 side and the operation of the buffer units 21 to 23 on the data processing unit side can be automatically synchronized, efficient data transfer becomes possible.
  • a feature of the embodiment of the present invention disclosed here is that the data transfer instruction and the SIMD instruction are executed in parallel and simultaneously, and the boys 21 to 23 of the Galois buffer 50 can be updated at the same time. Even if data transfer and calculation can be performed simultaneously, If it cannot be updated, high processing performance cannot be expected.
  • the SIMD instruction is roughly classified into a data transfer instruction during the register operation, an integer operation instruction for calculating the value of the register error, and a Galois field operation instruction that performs direct error correction processing.
  • the GICOPY instruction is an integer data transfer instruction that transfers data between the integer registers PO, P1,... P7 and pointers PS1, PS2, PD, and PEND.
  • the GC0PY instruction is a transfer instruction for the number of Galois, and transfers data between the Galois register 60 and the Galois buffer 50 overnight. When the Galois buffer 50 becomes an operand, the pointers 21 to 23 can be updated at the same time, which is effective when the buffer 50 is initialized.
  • the integer operation instruction performs addition, subtraction, increment, and decrement of the integer register in order to generate the values of the pointers 21 to 23.
  • the IZER0 flag is set when the addition result becomes zero
  • the INEG flag is set when the addition result becomes negative. If the integer register 70 stores the degree of a polynomial when performing error correction processing, a program can be created efficiently.However, when comparing the degree of two polynomials or giving the number of loop iterations, Use a flag for
  • the Galois field operation instruction is used to calculate the number on the Galois field used for error correction.
  • GADMS and GADMC that perform multiplication and multiply-accumulate simultaneously are provided.
  • D: Sy * D + Sz
  • Command GADMS is suitable for syndrome calculation
  • command GADMC is suitable for chain search.
  • a complex multiply-accumulate instruction (D: two Sw * Sx + Sy * Sz) is an instruction suitable for the quick algorithm.
  • the instruction GINV is used to find the inverse of a number on a Galois field by using it seven times. Division can be performed by multiplying by the reciprocal.
  • the GB (PS1 [+/-]) shown in the column of the contents of the Galois operation instructions GMULTSx, Sy, D, etc. is related to the Galois buffer (GB) 's Boyne (PS1). 1 or 1 can be selected, which means that the operation can be executed simultaneously by the instruction.
  • GB (PS2 [+/-]) has the same meaning for the pointer (PS2).
  • the operations of +1 and -1 are performed in the increment and decrement evenings respectively added to PS 1, PS 2 and PD. The choice is specified by the instruction.
  • FIG. 16 shows the structure of all the instruction codes of the general RISC instruction and the instruction combining the data transfer instruction and the SIMD instruction.
  • the general RISC instruction is assigned an n-bit instruction code corresponding to the moniker.
  • the data transfer instructions the one shown in FIG. 17 can be freely combined with the SIMD instruction shown in FIG. 18 to constitute a parallel execution instruction (composite instruction).
  • An m-bit code is assigned to the compound instruction, and the k-bit identification code for distinguishing from the general RISC instruction, the p-bit of the transfer instruction code shown in Fig. 17 and the 18th code It consists of q bits of the SIMD instruction code shown in the figure.
  • FIG. 22 shows an example of a program created by using the instruction set shown in FIGS. 13 to 15.
  • Fig. 22 shows the program This is a part of the Euclidean algorithm 1004 performed in the error correction processing of the lead-Solomon code shown in FIG.
  • the coefficients of the two old and new error numerical polynomials are stored in the Galois buffer 50. Since the order of the two old and new error numerical polynomials differs for each data processing unit 101, 102,... 10 ⁇ operating in parallel, the Galois buffer storing the coefficients of the two old and new error numerical polynomials is used. Is stored in integer register 70.
  • the highest and lowest order coefficients of the new error numerical polynomial are stored in the address of the Galois buffer 50 indicated by P0 and ⁇ 6, respectively.
  • the highest and lowest order coefficients of the old error numerical polynomial are stored in the addresses of the Galois buffer 50 indicated by ⁇ 2 and ⁇ 7, respectively.
  • the old error numerical polynomial is updated by performing a coefficient operation between the two old and new error numerical polynomials.
  • the order of the error numerical polynomial is the highest at the beginning of the transciprocal algorithm, and the order is gradually reduced by repeating the update. Finally, the error numerical polynomial 2004 of an appropriate order is obtained.
  • Figure 22 shows the update of old error polynomials in two repeat loops 3005 and 3009.
  • the values of the pointers (PS1) 21 and (PS2) 22 give the addresses where the coefficients of the new and old error numerical polynomials being calculated are stored, and the data processing units 101, 102,. .. 10n have different values for each You.
  • the values of the boys 21 and 22 decrease by 1 each time the arithmetic instructions 3006 and 3010 are executed.
  • PEND the corresponding data processing unit It goes into a standby state.
  • the error numerical polynomials to be processed are placed in a standby state sequentially from the one having a lower order, and when all the data processing units 101, 102,... 10 ⁇ are in a standby state, a repeat loop 3005, 3009 Is forcibly terminated.
  • the 16 times set as the number of repetitions in repeat loops 3005 and 3009 is the maximum theoretically possible number of repetitions, and even if this program is executed on the SIMD type parallel processor that does not have a repeat loop forcible termination mechanism, Works fine.
  • the system waits until the repetition of 2 times ends, and waits while holding the result.
  • the number of repetitions is about 12 times the average of the sum of the first and second repeat maps 3005 and 3009.
  • a forced termination mechanism it may be performed once or twice, but in an embodiment without a forced termination mechanism, 32 or more repetitions may be performed.
  • the SIMD type parallel processor having the repeat instruction forced termination mechanism shown in the fourth embodiment has the effect of greatly reducing the number of processing steps.
  • the repeat loop forced termination mechanism can be effectively used in other error correction processing routines and other ordinary digital signal processing. It often works.
  • FIG. 23 shows a system block diagram in which the SIMD parallel processor described above is applied to a DVD / CD-R0M device.
  • the peripheral circuit 900 is connected via a peripheral bus by a bus interface circuit 901.
  • the peripheral circuit 900 includes, for example, an analog interface circuit (analog I / F) 905, a D / A converter 904 for controlling the big-up 913, and a PWM (Pulse Width Modulation) converter for controlling the motors 911 and 912.
  • the adjustment circuit 903 and the D / A converter 902 for audio output.
  • the analog interface circuit 905 controls the big-up 913 via the analog signal processing circuit 909, fetches data, and fetches information necessary for control.
  • the information required for control is information on lens focus, envelope, focus, and tracking.
  • the control unit 200 of the processor Based on this information, the control unit 200 of the processor performs data processing, adjusts the focus and tracking of the make-up 913, and drives the thread motor 912 and the spindle motor 911.
  • the data read from the medium is taken into the memories 801 and 802 through the analog interface circuit 905, and subjected to error correction processing by the data processing units 101, 102,... 10 ⁇ . Is output.
  • the present invention does not use the control unit 200 only for controlling the SIMD parallel processor, but includes a general RISC instruction to perform servo control processing, tracking control processing, and the like by error correction and time division. Also used for Furthermore, in the preferred embodiment, a mechanism capable of executing general DSP instructions is added, and all system control tasks, signal processing tasks, and special data processing tasks such as error correction are described in a batch program. At this time, the control microcomputer, servo / tracking control LSI or DSP, which was a separate component in the past, can be omitted, and the SIMD parallel processor according to an example of the present invention can perform batch processing, resulting in a large equipment cost. It is reduced to Furthermore, since all tasks can be developed collectively, it is very easy to match tasks and the development period can be significantly reduced.
  • the program for error correction processing and the program for bit-up, mode, and audio output can be developed collectively, the cost for device development is greatly reduced.
  • a microcomputer for controlling the entire device is required as a device, but according to the present embodiment, a processor for performing error correction also controls the entire device, so that the device cost itself can be significantly reduced.
  • the above embodiment is an example in which the present invention is applied to a DVD / CD-ROM device.However, in order to cope with broadcasting media, pickups, modems, and the like are replaced with demodulation circuits, communication protocol control circuits, and the like. It is easily realized by: As described above, the introduction of a SIMD parallel processor to perform code error correction processing enables the data processing speed to be improved without any change in the basic architecture or program without increasing the processing speed. Only by increasing the number of overnight processing units to increase the degree of parallelism, it is possible to easily cope. Codes of different standards can be dealt with by changing the program, and systems that assume error correction of multiple standards can be easily handled.
  • branch processing can be realized by placing multiple data processing units in a standby state based on the results of individual calculations.
  • the present invention is not limited thereto, and it is needless to say that various modifications can be made without departing from the gist of the invention.
  • the operation program of the data processing device can be stored in a built-in ROM or provided by an external ROM or the like.
  • the number of data processing units and the bus configuration for connecting the data processing unit and the control unit are not limited to the above-described embodiment, and can be changed as appropriate.
  • the method of notifying the control unit of the standby state of each data processing unit is not limited to a configuration in which the control unit is notified by a signal for each data processing unit.
  • the control unit may refer to the standby state by accessing the standby register via the common bus to determine whether the standby state is established.
  • the present invention relates to a data processing device such as a SIMD parallel data processor for improving parallel processing performance or parallel processing efficiency, and a data code for correcting a code error in a storage system or a communication system.
  • Data processing system that performs encryption and decryption, for example, information reproduction or information recording systems on recording media such as CD-ROM, DVD, M0 (Magneto-Optics), and satellite broadcast receiving systems. Can be applied.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

Afin d'améliorer son débit de traitement parallèle, cette machine de traitement des données, une machine à architecture parallèle SIMD par exemple, est dotée d'une unité de commande (200) servant à décoder une instruction extraite et à l'exécuter ainsi que de plusieurs unités de traitement de données (101) conçues pour recevoir en parallèle une information de commande aux fins du calcul d'opérations provenant de l'unité de commande et pour transférer des données traitées par l'unité de commande. Chaque unité de traitement de données possède un élément de jugement (111) servant à estimer si les résultats du calcul exécuté par un élément de calcul s'accordent à des conditions d'attente, un registre d'attente (121) conçu pour être mis dans une condition de mise à zéro conformément aux résultats du jugement et un circuit (131) conçu pour mettre un terme à l'opération de traitement de données en réponse à la condition de mise à zéro du registre d'attente en tant qu'organe de commande d'attente mettant l'unité de traitement des données en condition d'attente conformément aux résultats d'un calcul fondé sur l'information de commande susmentionnée. C'est l'unité de commande qui exécute l'opération de commande de retour de chaque unité de traitement de données d'une condition d'attente à une condition active. Les unités de traitement de données sont mises en condition d'attente en fonction des différents résultats des calculs et le passage d'une unité de traitement d'une condition d'attente à une condition active est mis en oeuvre par l'unité de commande afin d'obtenir un processus de branchement. De ce fait, lorsque ce processus de branchement est mis en oeuvre de telle manière qu'une condition d'attente va en diminuant, il est facile de minimiser un cycle inutile provoqué par la condition d'attente des unités de traitement des données et le débit de traitement de la machine à architecture parallèle SIMD se trouve amélioré.
PCT/JP1997/003259 1997-09-16 1997-09-16 Machine de traitement des donnees et systeme de traitement des donnees WO1999014685A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP1997/003259 WO1999014685A1 (fr) 1997-09-16 1997-09-16 Machine de traitement des donnees et systeme de traitement des donnees

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP1997/003259 WO1999014685A1 (fr) 1997-09-16 1997-09-16 Machine de traitement des donnees et systeme de traitement des donnees

Publications (1)

Publication Number Publication Date
WO1999014685A1 true WO1999014685A1 (fr) 1999-03-25

Family

ID=14181117

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1997/003259 WO1999014685A1 (fr) 1997-09-16 1997-09-16 Machine de traitement des donnees et systeme de traitement des donnees

Country Status (1)

Country Link
WO (1) WO1999014685A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6732253B1 (en) 2000-11-13 2004-05-04 Chipwrights Design, Inc. Loop handling for single instruction multiple datapath processor architectures
US6931518B1 (en) 2000-11-28 2005-08-16 Chipwrights Design, Inc. Branching around conditional processing if states of all single instruction multiple datapaths are disabled and the computer program is non-deterministic
JP2013161271A (ja) * 2012-02-06 2013-08-19 Ricoh Co Ltd Simd型マイクロプロセッサ
US9069938B2 (en) 2006-11-03 2015-06-30 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US9235393B2 (en) 2002-07-09 2016-01-12 Iii Holdings 2, Llc Statically speculative compilation and execution
US9244689B2 (en) 2004-02-04 2016-01-26 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US9569186B2 (en) 2003-10-29 2017-02-14 Iii Holdings 2, Llc Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US9582650B2 (en) 2003-11-17 2017-02-28 Bluerisc, Inc. Security of program executables and microprocessors based on compiler-architecture interaction
CN116909626A (zh) * 2023-09-13 2023-10-20 腾讯科技(深圳)有限公司 数据处理方法、处理器及计算机设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03156558A (ja) * 1989-11-14 1991-07-04 Nec Home Electron Ltd ホストcpuとコプロセッサとの間の通信方法
JPH04130910A (ja) * 1990-09-21 1992-05-01 Nec Corp 情報処理装置
JPH05189585A (ja) * 1992-01-14 1993-07-30 Nippon Telegr & Teleph Corp <Ntt> 並列処理における条件付き演算制御回路
JPH06244741A (ja) * 1993-02-18 1994-09-02 Nec Corp 誤り訂正方法
JPH0963208A (ja) * 1995-08-23 1997-03-07 Victor Co Of Japan Ltd エラー訂正装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03156558A (ja) * 1989-11-14 1991-07-04 Nec Home Electron Ltd ホストcpuとコプロセッサとの間の通信方法
JPH04130910A (ja) * 1990-09-21 1992-05-01 Nec Corp 情報処理装置
JPH05189585A (ja) * 1992-01-14 1993-07-30 Nippon Telegr & Teleph Corp <Ntt> 並列処理における条件付き演算制御回路
JPH06244741A (ja) * 1993-02-18 1994-09-02 Nec Corp 誤り訂正方法
JPH0963208A (ja) * 1995-08-23 1997-03-07 Victor Co Of Japan Ltd エラー訂正装置

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6732253B1 (en) 2000-11-13 2004-05-04 Chipwrights Design, Inc. Loop handling for single instruction multiple datapath processor architectures
US6931518B1 (en) 2000-11-28 2005-08-16 Chipwrights Design, Inc. Branching around conditional processing if states of all single instruction multiple datapaths are disabled and the computer program is non-deterministic
US10101978B2 (en) 2002-07-09 2018-10-16 Iii Holdings 2, Llc Statically speculative compilation and execution
US9235393B2 (en) 2002-07-09 2016-01-12 Iii Holdings 2, Llc Statically speculative compilation and execution
US10248395B2 (en) 2003-10-29 2019-04-02 Iii Holdings 2, Llc Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US9569186B2 (en) 2003-10-29 2017-02-14 Iii Holdings 2, Llc Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US9582650B2 (en) 2003-11-17 2017-02-28 Bluerisc, Inc. Security of program executables and microprocessors based on compiler-architecture interaction
US9697000B2 (en) 2004-02-04 2017-07-04 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US9244689B2 (en) 2004-02-04 2016-01-26 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US10268480B2 (en) 2004-02-04 2019-04-23 Iii Holdings 2, Llc Energy-focused compiler-assisted branch prediction
US9940445B2 (en) 2006-11-03 2018-04-10 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US9069938B2 (en) 2006-11-03 2015-06-30 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US10430565B2 (en) 2006-11-03 2019-10-01 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
US11163857B2 (en) 2006-11-03 2021-11-02 Bluerisc, Inc. Securing microprocessors against information leakage and physical tampering
JP2013161271A (ja) * 2012-02-06 2013-08-19 Ricoh Co Ltd Simd型マイクロプロセッサ
CN116909626A (zh) * 2023-09-13 2023-10-20 腾讯科技(深圳)有限公司 数据处理方法、处理器及计算机设备
CN116909626B (zh) * 2023-09-13 2023-12-29 腾讯科技(深圳)有限公司 数据处理方法、处理器及计算机设备

Similar Documents

Publication Publication Date Title
US5691994A (en) Disk drive with fast error correction validation
US5689727A (en) Disk drive with pipelined embedded ECC/EDC controller which provides parallel operand fetching and instruction execution
US5640286A (en) Disk drive with error code embedded sector identification
US7600177B2 (en) Delta syndrome based iterative Reed-Solomon product code decoder
US7376812B1 (en) Vector co-processor for configurable and extensible processor architecture
US5812564A (en) Disk drive with embedded finite field processor for error correction
JP4295758B2 (ja) 誤り訂正装置、光ディスク制御装置、光ディスク読み取り装置ならびに誤り訂正方法
US7836291B2 (en) Method, medium, and apparatus with interrupt handling in a reconfigurable array
US6151669A (en) Methods and apparatus for efficient control of floating-point status register
EP0329789B1 (fr) Unite arithmetique a champ de galois
US7624330B2 (en) Unified memory architecture for recording applications
JP2011090592A (ja) 情報処理装置とその命令デコーダ
WO1999014685A1 (fr) Machine de traitement des donnees et systeme de traitement des donnees
TW476883B (en) Data processing device
US20050172210A1 (en) Add-compare-select accelerator using pre-compare-select-add operation
JP2001027945A (ja) Simd演算を実行するために標準macユニットを利用する浮動小数点ユニット
CN1320450C (zh) 提供可变宽度的至少六路加法指令的方法及相应装置
US6243845B1 (en) Code error correcting and detecting apparatus
JP3579843B2 (ja) ディジタル信号処理装置
JP2000259579A (ja) 半導体集積回路
US20020116599A1 (en) Data processing apparatus
US7234044B1 (en) Processor registers having state information
EP1220092B1 (fr) Système et procédé d&#39;exécution d&#39;opérations de chargement à latence variable
JPH04365139A (ja) 誤り訂正処理用シンドローム演算回路
EP1058392A1 (fr) Méthode pour réaliser dans un système de traitement de données plusieurs opérations addition-comparaison-sélection en parallèle

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR SG US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: KR

122 Ep: pct application non-entry in european phase