CN109947391B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN109947391B
CN109947391B CN201910182657.7A CN201910182657A CN109947391B CN 109947391 B CN109947391 B CN 109947391B CN 201910182657 A CN201910182657 A CN 201910182657A CN 109947391 B CN109947391 B CN 109947391B
Authority
CN
China
Prior art keywords
data
control signal
bit
bit width
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910182657.7A
Other languages
Chinese (zh)
Other versions
CN109947391A (en
Inventor
刘刚
冯春阳
张兴革
王俊杰
彭琅
黄晶
邹孝杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hexin Technology Suzhou Co ltd
Original Assignee
Hexin Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hexin Technology Suzhou Co ltd filed Critical Hexin Technology Suzhou Co ltd
Priority to CN201910182657.7A priority Critical patent/CN109947391B/en
Publication of CN109947391A publication Critical patent/CN109947391A/en
Application granted granted Critical
Publication of CN109947391B publication Critical patent/CN109947391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The invention discloses a data processing method and a device, wherein the data processing method comprises the following steps: acquiring a control signal and first data to be processed; performing replacement operation on the first data according to the control signal to obtain second data; and performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data. The method can flexibly combine 32-bit, 64-bit and 128-bit source data, and perform data replacement operation on the 32-bit, 64-bit and 128-bit source data, so that the 32-bit, 64-bit and 128-bit source data become data with the same bit width as a microprocessor after data replacement, and the front-end calculation of fixed/floating point data preprocessing can be completed, thereby ensuring that the calculation resources of each clock are not wasted when the microprocessor calculates various data types such as 32/64/128-bit fixed/floating point data and the like, and further improving the hardware use efficiency of the floating point operation of the microprocessor.

Description

Data processing method and device
Technical Field
The present invention relates to the field of microprocessor technologies, and in particular, to a data processing method and apparatus.
Background
With the development of semiconductor manufacturing processes and intensive computing, various applications are extremely complicated, and the computing power of microprocessors is increasing, and the most prominent expression is the enhancement of the computing power of fixed/floating point data vectors (data parallel execution). The current fixed/floating point calculation types are mainly divided into single-precision, double-precision and four-precision data operation, and various data such as 32/64/128 bits are covered. The microprocessor needs to add and configure corresponding computing hardware units such as adders, multipliers, dividers and the like in order to support the above computing types, and in a microprocessor with a maximum computing bit width of 128 bits, if operations of multiple data types are performed, 70% or 50% of computing resources are idle every execution clock cycle when computing 32/64 bit fixed/floating point data.
In view of the above problems, the simplest method is to directly idle the existing hardware resources, which is obviously inefficient, wastes hardware resources, and is obviously not preferable in a high-performance microprocessor design.
Disclosure of Invention
Accordingly, embodiments of the present invention provide a data processing method and apparatus to solve the problem that a high performance microprocessor can idle 70% or 50% of its computing resources every execution clock cycle when computing 32/64 bit fixed/floating point data.
According to a first aspect, an embodiment of the present invention provides a data processing method, including: acquiring a control signal and first data to be processed; performing replacement operation on the first data according to the control signal to obtain second data; and performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data.
Optionally, the permutation operation includes at least one of a data saturation operation, a data expansion operation, a data merge operation, a location permutation operation.
Optionally, performing a permutation operation on the first data according to the control signal to obtain second data includes: performing saturation operation on high bits of the first data according to the control signal; judging whether the operation result is within a preset range; and if the operation result is within the preset range, selecting the effective data of the first data to output to obtain the second data.
Optionally, if the operation result is not within the preset range, selecting preset data for outputting to obtain second data.
Optionally, performing a permutation operation on the first data according to the control signal to obtain second data includes: judging whether an instruction carrying data exists in the control signal or not; when an instruction carrying data exists in the control signal, expanding sign bits of the instruction carrying the data to obtain data with a first preset bit width; expanding sign bits of low-order data of the first data to obtain data with a second preset bit width; and taking the data with the first preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form second data.
Optionally, performing permutation operation on the first data according to the control signal to obtain the second data further includes: when no instruction carrying data exists in the control signal, expanding sign bits of high-order data of the first data to obtain data with a third preset bit width; and taking the data with the third preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form second data.
Optionally, performing data interleaving operation on the second data according to the control signal to obtain third data with a preset bit width includes: dividing the second data in units of bytes; and selecting and outputting the second data after byte division according to the control signal and the cross switch matrix to obtain third data with preset bit width.
According to a second aspect, an embodiment of the present invention provides a data processing apparatus, including: the acquisition module is used for acquiring the control signal and the first data to be processed; the replacement operation module is used for replacing the first data according to the control signal to obtain second data; and the interleaving operation module is used for carrying out data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data.
According to a third aspect, an embodiment of the present invention provides a controller, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor, the instructions being executable by the at least one processor to cause the at least one processor to perform the data processing method of the first aspect or any embodiment of the first aspect.
According to a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions for causing a computer to perform the data processing method of the first aspect or any implementation manner of the first aspect.
The embodiment of the invention provides a data processing method and a data processing device, which can flexibly combine 32-bit, 64-bit and 128-bit source data, and perform data replacement operation on the 32-bit, 64-bit and 128-bit source data, for example, perform compression, decompression, merging, replication, replacement, shift, selection and other vector operations on a 32/64/128-bit fixed/floating point data format, so that the 32-bit, 64-bit and 128-bit source data become data with the same bit width as a microprocessor after data replacement, and can complete the prepositive calculation of fixed/floating point data preprocessing, thereby ensuring that the calculation resource of each clock is not wasted when the microprocessor calculates various data types such as the 32/64/128-bit fixed/floating point data and the like, and further improving the hardware use efficiency of the floating point operation of the microprocessor.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a flow chart of a data processing method of an embodiment of the present invention;
FIG. 2 shows a basic circuit structure diagram of a data saturation operation, a data expansion operation and a data merging operation according to an embodiment of the present invention;
FIG. 3 is a basic structure diagram of a position changing operation unit according to an embodiment of the present invention;
FIG. 4 shows a basic block diagram of a data interleaving unit according to an embodiment of the present invention;
FIG. 5 shows a basic block diagram of a data interleave selector in accordance with an embodiment of the present invention;
FIG. 6 illustrates a block diagram of an unsigned to unsigned saturation mode of operation according to an embodiment of the present invention;
FIG. 7 illustrates a block diagram of a signed to unsigned saturation mode of operation according to an embodiment of the present invention;
FIG. 8 illustrates a block diagram of a signed to signed saturation mode of operation in accordance with an embodiment of the present invention;
FIG. 9 is a basic structure diagram of a data expansion operation according to an embodiment of the present invention;
FIG. 10 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention;
FIG. 11 shows a schematic diagram of a controller structure according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
An embodiment of the present invention provides a data processing method, as shown in fig. 1, including:
s101, acquiring a control signal and first data to be processed.
In this embodiment, the first data to be processed is source data, which may be any combination of 32/64/128 bit data, and the control signal is a control signal corresponding to each function generated in real time by decoding each function code and corresponding logic combination by the signal control unit according to the externally input selection control signal and the immediate in the first data to be processed. The selection control signals are in one-to-one correspondence with the data to be processed, and the control signals are also in one-to-one correspondence with the data to be processed, and the types of the selection control signals can comprise N groups which can be represented as S0, S1, … …, SN-2 and SN-1. An immediate is an instruction that carries data. Specifically, the first data to be processed may be multiple sets of 32/64/128 bit data, multiple sets of 32/64 bit data, or 128 bit data formed by combining 32/64 bit data.
S102, performing replacement operation on the first data according to the control signal to obtain second data.
In this embodiment, the permutation operation includes at least one of a data saturation operation (saturation), a data expansion operation (extension), a data merging operation, and a location replacement operation.
Specifically, the data saturation operation, the data expansion operation, and the data merging operation cover the functions shown in table 1.
TABLE 1
The basic circuit structures of the data saturation operation, the data expansion operation and the data merging operation are shown in fig. 2. For example, the control signal may control the data a (i.e., vr [ vra ]) and the data b (i.e., vr [ vrb ]) in the first data to obtain 128 bits of data (i.e., dataf_out [0:127 ]) respectively after data saturation operation, or the control signal may control the data b (i.e., vr [ vrb ]) and the immediate data (i.e., SIM [0:4 ]) in the first data to obtain 256 bits of data (i.e., dataf_out [0:255 ]) or ex_out [0:255 ]) after data expansion operation, or the control signal may control the data a and the data b in the first data to directly perform merging operation instead of data saturation operation and expansion operation, so as to obtain one bit of data (i.e., dataf_out [0:255 ]).
The position change operation unit is mainly used for realizing the replacement operation of the first data bit, and because the position change operation unit is mainly operated by taking bits as basic units, and various related position change operations cannot be subjected to circuit multiplexing, the data paths in the circuit unit are mutually independent, and the basic structure of the unit is shown in fig. 3. Specifically, the inside of the position change operation unit is mainly divided into 6 functional circuit modules, which covers 9 kinds of position change operations, and the position change operation unit comprises: bitwise AND, bitwise OR, bitwise XOR, negation, left shift, right shift. The position change operation unit can perform position change operation on the data in the first data under the control of the control signal unit (i.e. the signal control module), and finally outputs the calculation result (for example, bit_out [0:127 ]) through a multiplexer (i.e. the mux 6_1 module).
S103, performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data.
In this embodiment, the data interleaving unit performs the data interleaving operation in the byte unit. The controlling the data interleaving operation of the second data according to the control signal to obtain the third data with the preset bit width may include: dividing the second data in units of bytes; and selecting and outputting the second data after byte division according to the control signal and the cross switch matrix to obtain third data with preset bit width. For example, the input data interface of the data interleaving unit can be set to 256 bits, the output data interface can be set to 128 bits, the unit is internally composed of 16 data interleaving selectors (also called crossbar_mux) with the same microstructure circuits connected in parallel, and the basic structure of the data interleaving unit is shown in fig. 4. The data interleaving selector for data full interconnection selects data according to the mode of the crossbar matrix, and the basic structure of the data interleaving selector (crossbar_mux) is shown in fig. 5. Specifically, in 16 identical data interleave selectors, the control signal for each data interleave selector is composed of a 5-bit Byte select signal (i.e., sel), two 4-bit Byte control words (i.e., byte [ i ]. Bit [0:3] and Byte [ i ]. Bit [4:7 ]). When the control signal is a 5-bit byte selection signal, sel of 5 bits selects one of 32 bytes in dataf_out as an output. When the control signal is two 4-bit byte control words, the two 4-bit byte control words and the exclusive or gate circuit in fig. 5 enable the bytes to be subjected to replacement exclusive or operation, and select one byte from the high 16 bytes and the low 16 bytes in the dataf_out to be subjected to exclusive or operation, output a calculation result, and output 128-bit data after being selected by 16 identical data interleaving selectors. In this embodiment, the data interleaving unit input data interface may also be set to other bit widths, such as 64 bits, 128 bits, 512 bits, etc., according to other calculation requirements; the output data interface may also be set to other bit widths, such as 64 bits, 256 bits, etc.; accordingly, the number of data interleave selectors may be set to other numbers.
The embodiment of the invention provides a data processing method, which can flexibly combine 32-bit, 64-bit and 128-bit source data, and perform data replacement operation on the 32-bit, 64-bit and 128-bit source data, for example, perform compression, decompression, merging, copying, replacement, shift, selection and other vector operations on a 32/64/128-bit fixed/floating point data format, so that the 32-bit, 64-bit and 128-bit source data becomes data with the same bit width as a microprocessor after data replacement, and a front-end computing unit for preprocessing fixed/floating point data can be completed, thereby ensuring that the computing resource of each clock is not wasted when the microprocessor calculates various data types such as the 32/64/128-bit fixed/floating point data and the like, and further improving the hardware use efficiency of the floating point operation of the microprocessor.
In an alternative embodiment, performing a permutation operation on the first data according to the control signal to obtain the second data includes: performing saturation operation on high bits of the first data according to the control signal; judging whether the operation result is within a preset range; and if the operation result is within the preset range, selecting the effective data of the first data to output to obtain the second data. And if the operation result is not in the preset range, selecting preset data to output to obtain second data. In this embodiment, the data saturation operation mainly includes 3 types of operation modes, namely an unsigned-unsigned (uu) mode, a signed-unsigned (su) mode, and a signed-signed (ss) mode.
Specifically, the unsigned-unsigned (uu) mode is an unsigned-unsigned saturation operation mode, as shown in FIG. 6, in which the high-order data of half words, words and double words are respectively subjected to saturation operation according to control signals, for example, the high 8 bits of each half word are subjected to saturation operation to obtain a calculation result, if the calculation result is 0-2 8 And outputting the low 8-bit data of each half word as second data, and outputting preset 8-bit data as second data, wherein the preset 8-bit data is an upper limit value or a lower limit value.
The signed-unsigned (su) mode, i.e., the signed-unsigned saturation operation mode, as shown in fig. 7, performs saturation operation on the high-order data of the half words, the word, and the double word, respectively, according to the control signal, for example, judges each half word, if the sign bit is equal to 1, outputs the result 8'h00, if the sign bit is equal to 0, judges whether the high-order data of each half word is equal to 7' h7f, if equal to 7'h7f, outputs 8' hff, and if not equal to 7'h7f, outputs 8' hff.
A signed-signed (ss) mode, that is, a saturated operation mode from signed to signed, as shown in fig. 8, according to the control signal, respectively performing saturation operation on the high-order data of half words, words and double words, for example, judging each half word, if the sign bit is equal to 0, judging whether the high-order data (vr [8:15 ]) of each half word is equal to 8'h00, if equal to 8' h00, outputting { vr [0], vr [8:15] } (0 bit and 8-15bit of vr are spliced into one byte); if not equal to 8'h00, then output { vr [0],7' h7F }; if the sign bit is equal to 1, then judging whether the high-order data (vr [8:15 ]) of each half word is equal to 8'hFF, if so, outputting { vr [0], vr [8:15] } (0 bit and 8-15 bits of vr are spliced into one byte), and if not, outputting { vr [0],7' h00}.
In an alternative embodiment, as shown in fig. 9, performing a permutation operation on the first data according to the control signal to obtain second data includes: judging whether an instruction carrying data exists in the control signal, wherein in the embodiment, the immediate data is the instruction carrying the data; when an instruction carrying data exists in the control signal, the sign bit of the instruction carrying data is expanded to obtain data with a first preset bit width, and in this embodiment, a module for expanding the sign bit of the instruction carrying data is SIM [0:4] and sign-extended [0:31] module, 16 groups of [24 ] can be taken for byte patterns respectively: 31] data module, half word mode takes 8 groups [16:31] data module, word pattern takes 4 groups [0:31] data module, the output results in a 128-bit data, e.g., ex_out [0:127]; the sign bits of the lower data of the first data are extended, for example, vrb [64: 127) to b_s_ex, h_s_ex or w_s_ex to obtain a second predetermined bit-width data, e.g., ex_out [128:255], wherein the b_s_ex module is used to expand byte symbol bits to half word 8 (total 8 sets of half word parallel operation outputs); the h_s_ex module is used for expanding half character number bits into word_4 (total 4 groups of word parallel operation output); the w_s_ex module is used for expanding character number bits into double words_2 (total 2 groups of double word parallel operation output); the data of the first preset bit width is used as high-order data and the data of the second preset bit width is used as low-order data to form second data, such as ex_out [0:255]. When there is no instruction carrying data in the control signal, the sign bit of the high-order data of the first data is expanded to obtain data with a third preset bit width, for example, vrb [0: 63) to obtain 128-bit data ex_out [0] after b_s_ex, h_s_ex or w_s_ex expansion: 127]; and taking the data with the third preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form second data.
An embodiment of the present invention provides a data processing apparatus, as shown in fig. 10, including: an acquisition module 10, configured to acquire a control signal and first data to be processed; the permutation operation module 20 is configured to permute the first data according to the control signal to obtain second data; the interleaving operation module 30 is configured to perform data interleaving operation on the second data according to the control signal to obtain third data with a preset bit width, where the bit width of the second data is greater than the bit width of the third data.
The data processing device provided by the embodiment of the invention can be used as an acceleration hard core to be embedded in various computing units or can be integrated in a microprocessor in the form of an instruction set. When the device is integrated in a microprocessor as an instruction set, vector memory/fetch operation instructions with random arrangement and recombination of data with different bit widths can be processed, so that the execution efficiency of other instructions in a register in the microprocessor is improved; when the device is used as a front-end computing unit for preprocessing fixed/floating point data, the microprocessor can calculate 32/64/128 bit fixed/floating point data and other data types, and the computing resources of each clock are not wasted, so that the hardware use efficiency of the floating point operation of the microprocessor is improved.
The embodiment of the invention provides a controller, which comprises: at least one processor 71; and a memory 72 communicatively coupled to the at least one processor; one processor 71 is illustrated in fig. 11.
The detection device may further include: an input device 73 and an output device 74.
The processor 71, memory 72, input device 73 and output device 74 may be connected by a bus or otherwise, for example in fig. 11.
The processor 71 may be a central processing unit (Central Processing Unit, CPU). The processor 71 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 72 serves as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the data processing methods in the embodiments of the present application. The processor 71 executes various functional applications of the server and data processing, i.e., implements the data processing method of the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 72.
Memory 72 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of a processing device operated by the user terminal, and the like. In addition, memory 72 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 72 may optionally include memory located remotely from processor 71, which may be connected to the image detection, processing device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 73 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing means of the user terminal. The output device 74 may include a display device such as a display screen.
One or more modules are stored in the memory 72 that, when executed by the one or more processors 71, perform the method shown in fig. 1.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims (8)

1. A method of data processing, comprising:
acquiring a control signal and first data to be processed;
performing replacement operation on the first data according to the control signal to obtain second data; the replacement operation comprises at least one of a data saturation operation, a data expansion operation, a data merging operation and a position replacement operation;
performing data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, wherein the bit width of the second data is larger than that of the third data;
the step of performing data interleaving operation on the second data according to the control signal to obtain third data with a preset bit width includes:
dividing the second data in units of bytes;
and selecting and outputting the second data after byte division according to the control signal and the cross switch matrix to obtain third data with preset bit width.
2. The method according to claim 1, wherein the performing a permutation operation on the first data according to the control signal to obtain second data includes:
performing saturation operation on high bits of the first data according to the control signal;
judging whether the operation result is within a preset range;
and if the operation result is within the preset range, selecting the effective data of the first data to output to obtain second data.
3. The data processing method according to claim 2, wherein if the operation result is not within the preset range, selecting preset data for output to obtain second data.
4. The data processing method according to claim 1 or 2, wherein the performing a permutation operation on the first data according to the control signal to obtain second data includes:
judging whether an instruction carrying data exists in the control signal or not;
when an instruction carrying data exists in the control signal, expanding sign bits of the instruction carrying data to obtain data with a first preset bit width;
expanding sign bits of low-order data of the first data to obtain data with a second preset bit width;
and taking the data with the first preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form the second data.
5. The method of claim 4, wherein the permuting the first data according to the control signal to obtain second data further comprises:
when the control signal does not have an instruction carrying data, expanding sign bits of high-order data of the first data to obtain data with a third preset bit width;
and taking the data with the third preset bit width as high-order data and taking the data with the second preset bit width as low-order data to form the second data.
6. A data processing apparatus, comprising:
the acquisition module is used for acquiring the control signal and the first data to be processed;
the replacement operation module is used for carrying out replacement operation on the first data according to the control signal to obtain second data; the replacement operation comprises at least one of a data saturation operation, a data expansion operation, a data merging operation and a position replacement operation;
the interleaving operation module is used for carrying out data interleaving operation on the second data according to the control signal to obtain third data with preset bit width, and the bit width of the second data is larger than that of the third data;
the interleaving operation module is further configured to:
dividing the second data in units of bytes;
and selecting and outputting the second data after byte division according to the control signal and the cross switch matrix to obtain third data with preset bit width.
7. A controller, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor to cause the at least one processor to perform the data processing method of any of claims 1-5.
8. A computer readable storage medium storing computer instructions for causing a computer to perform the data processing method according to any one of claims 1 to 5.
CN201910182657.7A 2019-03-11 2019-03-11 Data processing method and device Active CN109947391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910182657.7A CN109947391B (en) 2019-03-11 2019-03-11 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910182657.7A CN109947391B (en) 2019-03-11 2019-03-11 Data processing method and device

Publications (2)

Publication Number Publication Date
CN109947391A CN109947391A (en) 2019-06-28
CN109947391B true CN109947391B (en) 2023-08-01

Family

ID=67008726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910182657.7A Active CN109947391B (en) 2019-03-11 2019-03-11 Data processing method and device

Country Status (1)

Country Link
CN (1) CN109947391B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784226B (en) * 2019-10-08 2022-12-02 中国科学院微电子研究所 Data processing method and data processing device based on PCM compression coding

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5530662A (en) * 1993-07-27 1996-06-25 Nec Corporation Fixed point signal processor having block floating processing circuitry
CN106605206A (en) * 2014-09-25 2017-04-26 英特尔公司 Bit group interleave processors, methods, systems, and instructions
CN106990937A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of floating number processing unit

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303438B2 (en) * 2017-01-16 2019-05-28 International Business Machines Corporation Fused-multiply-add floating-point operations on 128 bit wide operands

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5530662A (en) * 1993-07-27 1996-06-25 Nec Corporation Fixed point signal processor having block floating processing circuitry
CN106605206A (en) * 2014-09-25 2017-04-26 英特尔公司 Bit group interleave processors, methods, systems, and instructions
CN106990937A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of floating number processing unit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种支持高效加法的FPGA嵌入式DSP IP设计;王楠等;《太赫兹科学与电子信息学报》;20171025(第05期);全文 *

Also Published As

Publication number Publication date
CN109947391A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
US10514912B2 (en) Vector multiplication with accumulation in large register space
US20190391811A1 (en) Multi-variate strided read operations for accessing matrix operands
JP5988222B2 (en) Shuffle pattern generation circuit, processor, shuffle pattern generation method, instruction
US20160179523A1 (en) Apparatus and method for vector broadcast and xorand logical instruction
US20180081689A1 (en) Apparatus and method of improved extract instructions
US11709961B2 (en) Instruction execution that broadcasts and masks data values at different levels of granularity
US9965276B2 (en) Vector operations with operand base system conversion and re-conversion
US10089075B2 (en) Method and apparatus of instruction that merges and sorts smaller sorted vectors into larger sorted vector
US9336000B2 (en) Instruction execution unit that broadcasts data values at different levels of granularity
CN114153498A (en) System and method for loading a slice register pair
US10459728B2 (en) Apparatus and method of improved insert instructions
US10437562B2 (en) Apparatus and method for processing sparse data
CN113791820A (en) Bit matrix multiplication
US10162633B2 (en) Shift instruction
CN109947391B (en) Data processing method and device
US20180307488A1 (en) Multiply-and-accumulate-products instruction
US7647368B2 (en) Data processing apparatus and method for performing data processing operations on floating point data elements
WO2019023910A1 (en) Data processing method and device
EP3757822B1 (en) Apparatuses, methods, and systems for enhanced matrix multiplier architecture
CN114721624A (en) Processor, method and system for processing matrix
US20090249032A1 (en) Information apparatus
US20190042192A1 (en) Unified multifunction circuitry
US11074213B2 (en) Apparatuses, methods, and systems for vector processor architecture having an array of identical circuit blocks
CN117932201A (en) Chip and method for RISC-V matrix operation
JP2002108609A (en) Processor and data processing system using the processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215163 No. 9 Xuesen Road, Science and Technology City, Suzhou High-tech Zone, Jiangsu Province

Applicant after: Hexin Technology (Suzhou) Co.,Ltd.

Address before: 215163 building 3, No.9 Xuesen Road, science and Technology City, high tech Zone, Suzhou City, Jiangsu Province

Applicant before: SUZHOU POWERCORE TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant