CN109947391B

CN109947391B - Data processing method and device

Info

Publication number: CN109947391B
Application number: CN201910182657.7A
Authority: CN
Inventors: 刘刚; 冯春阳; 张兴革; 王俊杰; 彭琅; 黄晶; 邹孝杰
Original assignee: Hexin Technology Suzhou Co ltd
Current assignee: Hexin Technology Suzhou Co ltd
Priority date: 2019-03-11
Filing date: 2019-03-11
Publication date: 2023-08-01
Anticipated expiration: 2039-03-11
Also published as: CN109947391A

Abstract

The invention discloses a data processing method and a device, wherein the data processing method comprises the following steps: acquiring a control signal and first data to be processed; performing replacement operation on the first data according to the control signal to obtain second data; and performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data. The method can flexibly combine 32-bit, 64-bit and 128-bit source data, and perform data replacement operation on the 32-bit, 64-bit and 128-bit source data, so that the 32-bit, 64-bit and 128-bit source data become data with the same bit width as a microprocessor after data replacement, and the front-end calculation of fixed/floating point data preprocessing can be completed, thereby ensuring that the calculation resources of each clock are not wasted when the microprocessor calculates various data types such as 32/64/128-bit fixed/floating point data and the like, and further improving the hardware use efficiency of the floating point operation of the microprocessor.

Description

Data processing method and device

Technical Field

The present invention relates to the field of microprocessor technologies, and in particular, to a data processing method and apparatus.

Background

With the development of semiconductor manufacturing processes and intensive computing, various applications are extremely complicated, and the computing power of microprocessors is increasing, and the most prominent expression is the enhancement of the computing power of fixed/floating point data vectors (data parallel execution). The current fixed/floating point calculation types are mainly divided into single-precision, double-precision and four-precision data operation, and various data such as 32/64/128 bits are covered. The microprocessor needs to add and configure corresponding computing hardware units such as adders, multipliers, dividers and the like in order to support the above computing types, and in a microprocessor with a maximum computing bit width of 128 bits, if operations of multiple data types are performed, 70% or 50% of computing resources are idle every execution clock cycle when computing 32/64 bit fixed/floating point data.

In view of the above problems, the simplest method is to directly idle the existing hardware resources, which is obviously inefficient, wastes hardware resources, and is obviously not preferable in a high-performance microprocessor design.

Disclosure of Invention

Accordingly, embodiments of the present invention provide a data processing method and apparatus to solve the problem that a high performance microprocessor can idle 70% or 50% of its computing resources every execution clock cycle when computing 32/64 bit fixed/floating point data.

According to a first aspect, an embodiment of the present invention provides a data processing method, including: acquiring a control signal and first data to be processed; performing replacement operation on the first data according to the control signal to obtain second data; and performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data.

Optionally, the permutation operation includes at least one of a data saturation operation, a data expansion operation, a data merge operation, a location permutation operation.

Optionally, performing a permutation operation on the first data according to the control signal to obtain second data includes: performing saturation operation on high bits of the first data according to the control signal; judging whether the operation result is within a preset range; and if the operation result is within the preset range, selecting the effective data of the first data to output to obtain the second data.

Optionally, if the operation result is not within the preset range, selecting preset data for outputting to obtain second data.

Optionally, performing a permutation operation on the first data according to the control signal to obtain second data includes: judging whether an instruction carrying data exists in the control signal or not; when an instruction carrying data exists in the control signal, expanding sign bits of the instruction carrying the data to obtain data with a first preset bit width; expanding sign bits of low-order data of the first data to obtain data with a second preset bit width; and taking the data with the first preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form second data.

Optionally, performing permutation operation on the first data according to the control signal to obtain the second data further includes: when no instruction carrying data exists in the control signal, expanding sign bits of high-order data of the first data to obtain data with a third preset bit width; and taking the data with the third preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form second data.

Optionally, performing data interleaving operation on the second data according to the control signal to obtain third data with a preset bit width includes: dividing the second data in units of bytes; and selecting and outputting the second data after byte division according to the control signal and the cross switch matrix to obtain third data with preset bit width.

According to a second aspect, an embodiment of the present invention provides a data processing apparatus, including: the acquisition module is used for acquiring the control signal and the first data to be processed; the replacement operation module is used for replacing the first data according to the control signal to obtain second data; and the interleaving operation module is used for carrying out data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data.

According to a third aspect, an embodiment of the present invention provides a controller, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor, the instructions being executable by the at least one processor to cause the at least one processor to perform the data processing method of the first aspect or any embodiment of the first aspect.

According to a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions for causing a computer to perform the data processing method of the first aspect or any implementation manner of the first aspect.

The embodiment of the invention provides a data processing method and a data processing device, which can flexibly combine 32-bit, 64-bit and 128-bit source data, and perform data replacement operation on the 32-bit, 64-bit and 128-bit source data, for example, perform compression, decompression, merging, replication, replacement, shift, selection and other vector operations on a 32/64/128-bit fixed/floating point data format, so that the 32-bit, 64-bit and 128-bit source data become data with the same bit width as a microprocessor after data replacement, and can complete the prepositive calculation of fixed/floating point data preprocessing, thereby ensuring that the calculation resource of each clock is not wasted when the microprocessor calculates various data types such as the 32/64/128-bit fixed/floating point data and the like, and further improving the hardware use efficiency of the floating point operation of the microprocessor.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a flow chart of a data processing method of an embodiment of the present invention;

FIG. 2 shows a basic circuit structure diagram of a data saturation operation, a data expansion operation and a data merging operation according to an embodiment of the present invention;

FIG. 3 is a basic structure diagram of a position changing operation unit according to an embodiment of the present invention;

FIG. 4 shows a basic block diagram of a data interleaving unit according to an embodiment of the present invention;

FIG. 5 shows a basic block diagram of a data interleave selector in accordance with an embodiment of the present invention;

FIG. 6 illustrates a block diagram of an unsigned to unsigned saturation mode of operation according to an embodiment of the present invention;

FIG. 7 illustrates a block diagram of a signed to unsigned saturation mode of operation according to an embodiment of the present invention;

FIG. 8 illustrates a block diagram of a signed to signed saturation mode of operation in accordance with an embodiment of the present invention;

FIG. 9 is a basic structure diagram of a data expansion operation according to an embodiment of the present invention;

FIG. 10 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention;

FIG. 11 shows a schematic diagram of a controller structure according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

An embodiment of the present invention provides a data processing method, as shown in fig. 1, including:

s101, acquiring a control signal and first data to be processed.

In this embodiment, the first data to be processed is source data, which may be any combination of 32/64/128 bit data, and the control signal is a control signal corresponding to each function generated in real time by decoding each function code and corresponding logic combination by the signal control unit according to the externally input selection control signal and the immediate in the first data to be processed. The selection control signals are in one-to-one correspondence with the data to be processed, and the control signals are also in one-to-one correspondence with the data to be processed, and the types of the selection control signals can comprise N groups which can be represented as S0, S1, … …, SN-2 and SN-1. An immediate is an instruction that carries data. Specifically, the first data to be processed may be multiple sets of 32/64/128 bit data, multiple sets of 32/64 bit data, or 128 bit data formed by combining 32/64 bit data.

S102, performing replacement operation on the first data according to the control signal to obtain second data.

In this embodiment, the permutation operation includes at least one of a data saturation operation (saturation), a data expansion operation (extension), a data merging operation, and a location replacement operation.

Specifically, the data saturation operation, the data expansion operation, and the data merging operation cover the functions shown in table 1.

TABLE 1

The basic circuit structures of the data saturation operation, the data expansion operation and the data merging operation are shown in fig. 2. For example, the control signal may control the data a (i.e., vr [ vra ]) and the data b (i.e., vr [ vrb ]) in the first data to obtain 128 bits of data (i.e., dataf_out [0:127 ]) respectively after data saturation operation, or the control signal may control the data b (i.e., vr [ vrb ]) and the immediate data (i.e., SIM [0:4 ]) in the first data to obtain 256 bits of data (i.e., dataf_out [0:255 ]) or ex_out [0:255 ]) after data expansion operation, or the control signal may control the data a and the data b in the first data to directly perform merging operation instead of data saturation operation and expansion operation, so as to obtain one bit of data (i.e., dataf_out [0:255 ]).

The position change operation unit is mainly used for realizing the replacement operation of the first data bit, and because the position change operation unit is mainly operated by taking bits as basic units, and various related position change operations cannot be subjected to circuit multiplexing, the data paths in the circuit unit are mutually independent, and the basic structure of the unit is shown in fig. 3. Specifically, the inside of the position change operation unit is mainly divided into 6 functional circuit modules, which covers 9 kinds of position change operations, and the position change operation unit comprises: bitwise AND, bitwise OR, bitwise XOR, negation, left shift, right shift. The position change operation unit can perform position change operation on the data in the first data under the control of the control signal unit (i.e. the signal control module), and finally outputs the calculation result (for example, bit_out [0:127 ]) through a multiplexer (i.e. the mux 6_1 module).

S103, performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data.

In this embodiment, the data interleaving unit performs the data interleaving operation in the byte unit. The controlling the data interleaving operation of the second data according to the control signal to obtain the third data with the preset bit width may include: dividing the second data in units of bytes; and selecting and outputting the second data after byte division according to the control signal and the cross switch matrix to obtain third data with preset bit width. For example, the input data interface of the data interleaving unit can be set to 256 bits, the output data interface can be set to 128 bits, the unit is internally composed of 16 data interleaving selectors (also called crossbar_mux) with the same microstructure circuits connected in parallel, and the basic structure of the data interleaving unit is shown in fig. 4. The data interleaving selector for data full interconnection selects data according to the mode of the crossbar matrix, and the basic structure of the data interleaving selector (crossbar_mux) is shown in fig. 5. Specifically, in 16 identical data interleave selectors, the control signal for each data interleave selector is composed of a 5-bit Byte select signal (i.e., sel), two 4-bit Byte control words (i.e., byte [ i ]. Bit [0:3] and Byte [ i ]. Bit [4:7 ]). When the control signal is a 5-bit byte selection signal, sel of 5 bits selects one of 32 bytes in dataf_out as an output. When the control signal is two 4-bit byte control words, the two 4-bit byte control words and the exclusive or gate circuit in fig. 5 enable the bytes to be subjected to replacement exclusive or operation, and select one byte from the high 16 bytes and the low 16 bytes in the dataf_out to be subjected to exclusive or operation, output a calculation result, and output 128-bit data after being selected by 16 identical data interleaving selectors. In this embodiment, the data interleaving unit input data interface may also be set to other bit widths, such as 64 bits, 128 bits, 512 bits, etc., according to other calculation requirements; the output data interface may also be set to other bit widths, such as 64 bits, 256 bits, etc.; accordingly, the number of data interleave selectors may be set to other numbers.

The embodiment of the invention provides a data processing method, which can flexibly combine 32-bit, 64-bit and 128-bit source data, and perform data replacement operation on the 32-bit, 64-bit and 128-bit source data, for example, perform compression, decompression, merging, copying, replacement, shift, selection and other vector operations on a 32/64/128-bit fixed/floating point data format, so that the 32-bit, 64-bit and 128-bit source data becomes data with the same bit width as a microprocessor after data replacement, and a front-end computing unit for preprocessing fixed/floating point data can be completed, thereby ensuring that the computing resource of each clock is not wasted when the microprocessor calculates various data types such as the 32/64/128-bit fixed/floating point data and the like, and further improving the hardware use efficiency of the floating point operation of the microprocessor.

In an alternative embodiment, performing a permutation operation on the first data according to the control signal to obtain the second data includes: performing saturation operation on high bits of the first data according to the control signal; judging whether the operation result is within a preset range; and if the operation result is within the preset range, selecting the effective data of the first data to output to obtain the second data. And if the operation result is not in the preset range, selecting preset data to output to obtain second data. In this embodiment, the data saturation operation mainly includes 3 types of operation modes, namely an unsigned-unsigned (uu) mode, a signed-unsigned (su) mode, and a signed-signed (ss) mode.

Specifically, the unsigned-unsigned (uu) mode is an unsigned-unsigned saturation operation mode, as shown in FIG. 6, in which the high-order data of half words, words and double words are respectively subjected to saturation operation according to control signals, for example, the high 8 bits of each half word are subjected to saturation operation to obtain a calculation result, if the calculation result is 0-2 ⁸ And outputting the low 8-bit data of each half word as second data, and outputting preset 8-bit data as second data, wherein the preset 8-bit data is an upper limit value or a lower limit value.

The signed-unsigned (su) mode, i.e., the signed-unsigned saturation operation mode, as shown in fig. 7, performs saturation operation on the high-order data of the half words, the word, and the double word, respectively, according to the control signal, for example, judges each half word, if the sign bit is equal to 1, outputs the result 8'h00, if the sign bit is equal to 0, judges whether the high-order data of each half word is equal to 7' h7f, if equal to 7'h7f, outputs 8' hff, and if not equal to 7'h7f, outputs 8' hff.

A signed-signed (ss) mode, that is, a saturated operation mode from signed to signed, as shown in fig. 8, according to the control signal, respectively performing saturation operation on the high-order data of half words, words and double words, for example, judging each half word, if the sign bit is equal to 0, judging whether the high-order data (vr [8:15 ]) of each half word is equal to 8'h00, if equal to 8' h00, outputting { vr [0], vr [8:15] } (0 bit and 8-15bit of vr are spliced into one byte); if not equal to 8'h00, then output { vr [0],7' h7F }; if the sign bit is equal to 1, then judging whether the high-order data (vr [8:15 ]) of each half word is equal to 8'hFF, if so, outputting { vr [0], vr [8:15] } (0 bit and 8-15 bits of vr are spliced into one byte), and if not, outputting { vr [0],7' h00}.

In an alternative embodiment, as shown in fig. 9, performing a permutation operation on the first data according to the control signal to obtain second data includes: judging whether an instruction carrying data exists in the control signal, wherein in the embodiment, the immediate data is the instruction carrying the data; when an instruction carrying data exists in the control signal, the sign bit of the instruction carrying data is expanded to obtain data with a first preset bit width, and in this embodiment, a module for expanding the sign bit of the instruction carrying data is SIM [0:4] and sign-extended [0:31] module, 16 groups of [24 ] can be taken for byte patterns respectively: 31] data module, half word mode takes 8 groups [16:31] data module, word pattern takes 4 groups [0:31] data module, the output results in a 128-bit data, e.g., ex_out [0:127]; the sign bits of the lower data of the first data are extended, for example, vrb [64: 127) to b_s_ex, h_s_ex or w_s_ex to obtain a second predetermined bit-width data, e.g., ex_out [128:255], wherein the b_s_ex module is used to expand byte symbol bits to half word 8 (total 8 sets of half word parallel operation outputs); the h_s_ex module is used for expanding half character number bits into word_4 (total 4 groups of word parallel operation output); the w_s_ex module is used for expanding character number bits into double words_2 (total 2 groups of double word parallel operation output); the data of the first preset bit width is used as high-order data and the data of the second preset bit width is used as low-order data to form second data, such as ex_out [0:255]. When there is no instruction carrying data in the control signal, the sign bit of the high-order data of the first data is expanded to obtain data with a third preset bit width, for example, vrb [0: 63) to obtain 128-bit data ex_out [0] after b_s_ex, h_s_ex or w_s_ex expansion: 127]; and taking the data with the third preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form second data.

An embodiment of the present invention provides a data processing apparatus, as shown in fig. 10, including: an acquisition module 10, configured to acquire a control signal and first data to be processed; the permutation operation module 20 is configured to permute the first data according to the control signal to obtain second data; the interleaving operation module 30 is configured to perform data interleaving operation on the second data according to the control signal to obtain third data with a preset bit width, where the bit width of the second data is greater than the bit width of the third data.

The data processing device provided by the embodiment of the invention can be used as an acceleration hard core to be embedded in various computing units or can be integrated in a microprocessor in the form of an instruction set. When the device is integrated in a microprocessor as an instruction set, vector memory/fetch operation instructions with random arrangement and recombination of data with different bit widths can be processed, so that the execution efficiency of other instructions in a register in the microprocessor is improved; when the device is used as a front-end computing unit for preprocessing fixed/floating point data, the microprocessor can calculate 32/64/128 bit fixed/floating point data and other data types, and the computing resources of each clock are not wasted, so that the hardware use efficiency of the floating point operation of the microprocessor is improved.

The embodiment of the invention provides a controller, which comprises: at least one processor 71; and a memory 72 communicatively coupled to the at least one processor; one processor 71 is illustrated in fig. 11.

The detection device may further include: an input device 73 and an output device 74.

The processor 71, memory 72, input device 73 and output device 74 may be connected by a bus or otherwise, for example in fig. 11.

The processor 71 may be a central processing unit (Central Processing Unit, CPU). The processor 71 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 72 serves as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the data processing methods in the embodiments of the present application. The processor 71 executes various functional applications of the server and data processing, i.e., implements the data processing method of the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 72.

Memory 72 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of a processing device operated by the user terminal, and the like. In addition, memory 72 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 72 may optionally include memory located remotely from processor 71, which may be connected to the image detection, processing device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 73 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing means of the user terminal. The output device 74 may include a display device such as a display screen.

One or more modules are stored in the memory 72 that, when executed by the one or more processors 71, perform the method shown in fig. 1.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. A method of data processing, comprising:

acquiring a control signal and first data to be processed;

performing replacement operation on the first data according to the control signal to obtain second data; the replacement operation comprises at least one of a data saturation operation, a data expansion operation, a data merging operation and a position replacement operation;

performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data;

the step of performing data interleaving operation on the second data according to the control signal to obtain third data with a preset bit width includes:

dividing the second data in units of bytes;

and selecting and outputting the second data after byte division according to the control signal and the cross switch matrix to obtain third data with preset bit width.

2. The method according to claim 1, wherein the performing a permutation operation on the first data according to the control signal to obtain second data includes:

performing saturation operation on high bits of the first data according to the control signal;

judging whether the operation result is within a preset range;

and if the operation result is within the preset range, selecting the effective data of the first data to output to obtain second data.

3. The data processing method according to claim 2, wherein if the operation result is not within the preset range, selecting preset data for output to obtain second data.

4. The data processing method according to claim 1 or 2, wherein the performing a permutation operation on the first data according to the control signal to obtain second data includes:

judging whether an instruction carrying data exists in the control signal or not;

when an instruction carrying data exists in the control signal, expanding sign bits of the instruction carrying data to obtain data with a first preset bit width;

expanding sign bits of low-order data of the first data to obtain data with a second preset bit width;

and taking the data with the first preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form the second data.

5. The method of claim 4, wherein the permuting the first data according to the control signal to obtain second data further comprises:

when the control signal does not have an instruction carrying data, expanding sign bits of high-order data of the first data to obtain data with a third preset bit width;

and taking the data with the third preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form the second data.

6. A data processing apparatus, comprising:

the acquisition module is used for acquiring the control signal and the first data to be processed;

the replacement operation module is used for carrying out replacement operation on the first data according to the control signal to obtain second data; the replacement operation comprises at least one of a data saturation operation, a data expansion operation, a data merging operation and a position replacement operation;

the interleaving operation module is used for carrying out data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, and the bit width of the second data is larger than that of the third data;

the interleaving operation module is further configured to:

dividing the second data in units of bytes;

7. A controller, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor to cause the at least one processor to perform the data processing method of any of claims 1-5.

8. A computer readable storage medium storing computer instructions for causing a computer to perform the data processing method according to any one of claims 1 to 5.