CN111384960B - Decoding method, processor, decoding device and storage medium - Google Patents

Decoding method, processor, decoding device and storage medium Download PDF

Info

Publication number
CN111384960B
CN111384960B CN201811623531.0A CN201811623531A CN111384960B CN 111384960 B CN111384960 B CN 111384960B CN 201811623531 A CN201811623531 A CN 201811623531A CN 111384960 B CN111384960 B CN 111384960B
Authority
CN
China
Prior art keywords
preset
data
character
run
character code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811623531.0A
Other languages
Chinese (zh)
Other versions
CN111384960A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201811623531.0A priority Critical patent/CN111384960B/en
Priority to PCT/CN2019/121056 priority patent/WO2020114283A1/en
Publication of CN111384960A publication Critical patent/CN111384960A/en
Application granted granted Critical
Publication of CN111384960B publication Critical patent/CN111384960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput
    • H03M7/6029Pipelining

Abstract

The present application relates to a decoding method, a processor, a decoding apparatus, and a storage medium. The method comprises the following steps: and decoding the run-length coded data in the model data. By adopting the method, the operation correctness can be ensured on the premise of data compression.

Description

Decoding method, processor, decoding device and storage medium
Technical Field
The present application relates to the field of statistical coding technologies, and in particular, to a decoding method, a processor, a decoding apparatus, and a storage medium.
Background
With the continuous development of digital electronic technology, the rapid development of various Artificial Intelligence (AI) chips has higher and higher requirements for neural network processors. The neural network algorithm is one of algorithms widely used by intelligent chips and is operated in a neural network processor.
However, due to the wide application of fixed-point and sparse methods, a large number of continuous zeros usually appear in the model data, which occupies a large amount of bandwidth resources and affects the overall processing efficiency of the neural network processor. In the process of processing by a processor, redundant data is usually compressed by using an encoding technology, but the compressed data cannot directly participate in the operation.
Disclosure of Invention
In view of the above, it is desirable to provide a decoding method, a processor, a decoding device, and a storage medium, which can decode run-length encoded data and ensure the accuracy of the operation on the premise of data compression.
A method of decoding, the method comprising:
acquiring coded data;
according to the data bit width and the run bit width, identifying the coded data to obtain character codes and runs;
and unfolding the run according to the data bit width to obtain a first preset number.
In one embodiment, the identifying the encoded data according to the data bit width and the run bit width to obtain the character encoding and the run includes:
acquiring the data length of a character string in the coded data;
if the data length of the character string is equal to the data bit width, identifying the character string as a character code; and if the data length of the character string is equal to the run bit width, identifying the character string as a run.
A method of decoding, the method comprising:
acquiring coded data;
identifying the coded data to obtain a character code and a preset character code, wherein the preset character code comprises a first preset character code and a second preset character code;
according to the data bit width, unfolding the preset character codes to obtain a character code of a first preset number and a run threshold, wherein the run threshold is arranged behind the character code of the first preset number;
and unfolding the run threshold according to the data bit width to obtain a plurality of continuously arranged first preset numbers, wherein the number of the first preset numbers is the same as that of the first preset numbers represented by the run threshold.
In one embodiment, if the encoded data includes a plurality of character codes with the same numerical value, identifying the preset character code by determining whether the character code includes an additional character check bit includes:
acquiring the data length of the character code;
comparing the data length of the character code with the data bit width;
if the data length of the character code is equal to the data bit width, judging that the additional character check bit is not set in the character code;
and identifying the character code without the additional character check bit as the preset character code.
In one embodiment, the method further comprises:
and forwarding and operating the decoded data in an operation unit, wherein the operation comprises multiplication operation, accumulation operation and activation operation.
A processor comprises an arithmetic unit, a storage unit and a controller unit, wherein the storage unit is arranged adjacent to the arithmetic unit;
the arithmetic unit comprises a decoding module, wherein the decoding module is used for identifying coded data, obtaining character codes and runs, and unfolding the runs according to data bit width to obtain a first preset number;
the storage unit is used for storing original data and coded data and carrying out data transmission with the controller unit and the arithmetic unit;
the controller unit is used for acquiring input data and a calculation instruction, and sending a plurality of calculation instructions obtained by analyzing the calculation instruction and the input data to the calculation unit.
In one embodiment, the decoding module comprises a control signal interface, a buffer, a plurality of registers and an output module;
the control signal interface is used for realizing connection between the decoding module and the controller unit and data transmission;
the cache is connected with a register arranged adjacent to the cache and used for storing the coded data;
the plurality of registers are used for storing the execution result of the multistage pipeline;
the output module is used for storing and outputting the decoded data.
In one embodiment, the arithmetic unit comprises a master processing circuit and at least one slave processing circuit, each of the at least one slave processing circuit being connected to the master processing circuit;
the decoding module is arranged in the main processing circuit and each slave processing circuit.
An apparatus for decoding, the apparatus comprising:
the coded data acquisition module is used for acquiring coded data;
the coded data identification module is used for identifying the coded data according to the data bit width and the run bit width to obtain character codes and runs;
and the first preset digit acquisition module is used for unfolding the run according to the data bit width to obtain a first preset digit.
An apparatus for decoding, the apparatus comprising:
the coded data acquisition module is used for acquiring coded data;
the coded data identification module is used for identifying the coded data to obtain a character code and a preset character code, wherein the preset character code comprises a first preset character code and a second preset character code;
the device comprises a preset character code unfolding module, a run threshold value and a data processing module, wherein the preset character code unfolding module is used for unfolding the preset character code according to the data bit width to obtain a character code of a first preset number and the run threshold value, and the run threshold value is arranged behind the character code of the first preset number;
and the run threshold value unfolding module is used for unfolding the run threshold value according to the data bit width to obtain a plurality of continuously arranged first preset numbers, wherein the number of the first preset numbers is the same as that of the first preset numbers represented by the run threshold value.
A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program realizes the steps of the above-mentioned decoding method when being executed by a processor.
According to the decoding method, the processor, the decoding device and the storage medium, the decoding module is arranged in the operation unit, the preset character codes are expanded according to the data bit width in the first-stage production line to obtain the character codes and the run threshold of a first preset number, and the run threshold is expanded according to the data bit width in the second-stage production line to obtain a plurality of continuously arranged first preset numbers, so that the operation correctness is ensured on the premise of data compression; meanwhile, the decoding operation is executed by utilizing a two-stage pipeline, so that the efficiency of data decompression can be further improved.
Drawings
FIG. 1 is a block diagram of a processor 1000 in one embodiment;
FIG. 2 is a block diagram of a processor 2000 in one embodiment;
FIG. 3 is a diagram illustrating a structure of a buffer in an encoding apparatus according to an embodiment;
FIG. 4 is a block diagram of a processor 3000 according to one embodiment;
FIG. 5 is a schematic diagram of a processor 4000 in one embodiment;
FIG. 6 is a schematic diagram of a processor according to another embodiment;
FIG. 7 is a diagram showing the structure of a processor according to another embodiment;
FIG. 8 is a diagram showing the structure of a processor according to another embodiment;
FIG. 9 is a schematic diagram of the main processing circuit in another embodiment;
FIG. 10 is a flow diagram illustrating an encoding method according to one embodiment;
FIG. 11 is a flow diagram illustrating a decoding method in one embodiment;
FIG. 12 is a flow chart illustrating a decoding method in another embodiment;
FIG. 13 is a block diagram showing the structure of an encoding apparatus according to an embodiment;
FIG. 14 is a block diagram showing the structure of a decoding apparatus according to an embodiment;
fig. 15 is a block diagram showing the structure of a decoding apparatus in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The terms "first," "second," and "third," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The encoding method provided by the present application can be applied to the processor 1000 shown in fig. 1. The processor 1000 includes an arithmetic unit 12, a storage unit 10 disposed adjacent to the arithmetic unit 12, and a controller unit 11, wherein the controller unit 11 is connected between the arithmetic unit 12 and the storage unit 10. The arithmetic unit 12 includes an encoding module 1001, where the encoding module 1001 is configured to encode a first preset number in the input data according to a run bit width to obtain a run, where the run is used to indicate the number of the first preset number.
Specifically, the encoding module 1001 may set the run bit width according to the frequency of occurrence of a first preset number in the input data. Further, the encoding module 1001 may replace the plurality of first preset numbers arranged in series in the input data with runs. It should be noted that the number of the first preset numbers arranged consecutively that the run can represent cannot exceed the run threshold.
The storage unit 10 is used for storing original data and encoded data, and performs data transmission with the controller unit 11 and the arithmetic unit 12.
In particular, the memory unit 10 may be a buffer and/or a register provided inside the processor 1000. The memory unit 10 may be a nonvolatile memory or a volatile memory, and is not particularly limited herein. The data form transmitted between the storage unit 10, the controller unit 11, and the arithmetic unit 12 may be raw data or encoded data.
The controller unit 11 is configured to obtain input data and a calculation instruction, and send a plurality of calculation instructions obtained by analyzing the calculation instruction and the input data to the calculation unit 12.
Specifically, the input data obtaining and instruction calculating modes may be obtained through a data input/output unit, and the data input/output unit may specifically be one or more data I/O interfaces or I/O pins.
The above calculation instructions include, but are not limited to: the present invention is not limited to the specific expression of the above-mentioned computation instruction, such as a convolution operation instruction, or a forward training instruction, or other neural network operation instruction.
Specifically, the controller unit 11 analyzes the acquired calculation instruction to obtain a plurality of operation instructions. Further, the controller unit 11 transmits the plurality of operation commands obtained by the analysis and the acquired input data to the operation unit 12.
In the processor, the coding module is arranged in the operation unit to perform run-length coding on the first preset number in the input data, so that data compression on the input data is realized, and bandwidth resources are saved.
In one embodiment, the encoding module 1001 is further configured to perform character encoding on the first preset number arranged at the top after being greater than a run threshold according to the data bit width of the first preset number; and according to the run bit width, coding other first preset numbers which are arranged behind the first preset number after being larger than the run threshold value to obtain the run, and writing the run into the target code.
The encoding module may obtain the run threshold according to the set run bit width, for example: setting the run bit width to be 2 bits, the run can represent at most three first preset numbers which are continuously arranged, namely, the run threshold is 3.
Specifically, the encoding module 1001 obtains the number of the first preset numbers arranged in series, compares the obtained number of the first preset numbers arranged in series with the run threshold, stops run encoding if the number of the first preset numbers arranged in series is greater than the run threshold, treats the first preset number arranged after being greater than the run threshold as the second preset number, and performs character encoding according to the preset data bit width of the first preset number. Further, the encoding module 1001 replaces other first preset numbers after the first preset number arranged after the threshold value of the run with the run.
In one embodiment, the encoding module 1001 is further configured to perform character encoding on the first preset number according to a data bit width of the first preset number if the first number of the input data is the first preset number; and coding other first preset numbers behind the first digit in the input data according to the run bit width to obtain the run, and writing the run into the target code.
Specifically, if the first data in the input data is a first predetermined number, the encoding device 1001 treats the first predetermined number as a second predetermined number, and performs character encoding according to a data bit width of the first predetermined number set in advance. Further, if the first preset number is arranged after the first preset number arranged at the head, the encoding apparatus 1001 replaces other first preset numbers after the first preset number arranged at the head with runs.
In the processor, the run-length coding is carried out on the first preset numbers arranged in different forms according to the run-length bit width, so that data compression under various conditions is realized, and the diversity and compatibility of data coding are realized.
In one embodiment, the encoding module 1001 is further configured to replace the first preset number after character encoding and the run length after the first preset number after character encoding by using a first preset character.
If the run reaches the run threshold, the encoding module 1001 selects data with a low frequency of occurrence as a first predetermined character. Specifically, when the number of the first preset digits arranged continuously is greater than the run threshold and the number of the first preset digits arranged continuously after the first preset digit arranged at the head reaches the run threshold, the run of the first preset digit after the character arranged at the head is coded and the plurality of first preset digits arranged continuously after the first preset digit is replaced by the first preset character in the second-stage pipeline, so that data compression is further realized.
In the processor, the two stages of pipelines are arranged to execute the coding operation, so that the data compression can be further realized, and the efficiency of run-length coding is improved
In one embodiment, the encoding module 1001 is further configured to set an additional character check bit for a character code having a same numerical value as the first preset character if the character code having the same numerical value as the first preset character exists in the target code.
Specifically, an additional character check bit may be added to the character code having the same numerical value as the first preset character, or multiple additional character check bits may be added, which is not specifically limited herein. In view of saving bandwidth, it is preferable to add an additional character check bit and set the additional character check bit at the last bit of the character string.
In the processor, the additional character check bit is set for the character code with the same numerical value as the first preset character, so that the first preset character and the character code with the same numerical value as the first preset character can be distinguished, and the problem of definition conflict of the first preset character is solved.
In one embodiment, the encoding module 1001 is further configured to replace the first preset character with a second preset character; and if the target code has the character code with the same value as the second preset character value, setting an additional character check bit for the character code with the same value as the second preset character value.
Specifically, data with less frequency of occurrence is selected as a second preset character, and the coding module replaces the first preset character with the second preset character. An additional character check bit may be added to the character code having the same value as the second preset character value, or a plurality of additional character check bits may be added, which is not limited herein. In view of saving bandwidth, it is preferable to add an additional character check bit and set the additional character check bit at the last bit of the character string.
In the processor, the first preset character is replaced by the second preset character, and the additional character check bit is set for the character code with the same numerical value as the second preset character, so that the second preset character and the character code with the same numerical value as the second preset character can be distinguished, and the problem of definition conflict of the second preset character is solved; meanwhile, the additional burden of the processor can be further reduced by selecting the second preset characters with less occurrence frequency for distinguishing.
In one embodiment, as shown in fig. 2, a processor 2000 is provided, the processor 2000 includes an arithmetic unit 12, a storage unit 10 disposed adjacent to the arithmetic unit 12, and a controller unit 11, the controller unit 11 is connected between the arithmetic unit 12 and the storage unit 10. The arithmetic unit 12 includes an encoding module 2001, and the encoding module 2001 includes a control signal interface 2002, a buffer 2003, a plurality of registers 2004, a configuration module 2005, and an output module 2006.
The control signal interface 2002 serves as an externally input hardware interface for implementing connection between the encoding module 2001 and the controller unit 11 and data transmission.
The buffer 2003 is respectively connected to the register 2004, the configuration module 2005, and the output module 2006, which are disposed adjacent to the buffer 2003, and configured to store input data, a first predetermined character, a second predetermined character, and a first predetermined character code.
As an alternative embodiment, as shown in fig. 3, the cache 2003 includes an input data cache 20031 and a preset character cache 20032, and the preset character cache 20032 includes a first preset character cache 20033, a second preset character cache 20034 and a first preset character encoding cache 20035;
the input data buffer 20031 is used for storing data to be encoded input into the encoding module 2001; the first preset character cache 20033 is used for storing a first preset character; the second preset character cache 20034 is used for storing a second preset character; the first preset character code cache 20035 is used for storing a first preset character code obtained by coding the first preset character.
Among other things, a plurality of registers 2004 are used to store the results of the execution of the multi-stage pipeline.
Specifically, each stage of pipeline corresponds to one register 2004, and the register 2004 is used for storing an intermediate encoding result obtained after data encoding is performed on the stage of pipeline corresponding to the register 2004.
The configuration module 2005 is configured to encode a first preset character to obtain a first preset character code, and store the first preset character code in the cache 2003.
The output module 2006 is used for storing and outputting the encoded data.
Specifically, the output module 2006 may store data that the current code stream has been encoded, and output the encoded data to the operation unit 12 to participate in forwarding and operation.
In the processor, the encoding module is improved, and the intermediate encoding result in the execution process of each stage of pipeline can be stored by setting an intersegment register for each stage of pipeline; the first preset character is coded in advance in the configuration module and stored in the cache, and when the character definition conflict needs to be replaced, the first preset character code stored in the cache is directly used for character replacement, so that the process of repeated coding is omitted.
The decoding method provided by the present application can be applied to the processor 3000 shown in fig. 4. The processor 3000 includes an arithmetic unit 12, a storage unit 10 disposed adjacent to the arithmetic unit 12, and a controller unit 11, wherein the controller unit 11 is connected between the arithmetic unit 12 and the storage unit 10. The arithmetic unit 12 includes a decoding module 3001, and the decoding module 3001 is configured to perform character encoding on the run in the encoded data according to a data bit width to obtain a first preset number.
Specifically, the decoding module 3001 may perform character encoding on the run in the encoded number according to a set data bit width to obtain one or more first preset numbers arranged in series. It should be noted that the number of the first preset numbers obtained by decoding by the decoding module 3001 is the same as the number of the first preset numbers represented by the run length.
The storage unit 10 is used for storing original data and encoded data, and performs data transmission with the controller unit 11 and the arithmetic unit 12.
Specifically, the storage unit 10 may be a buffer and/or a register provided inside the processor 3000. The memory unit 10 may be a nonvolatile memory or a volatile memory, and is not particularly limited herein. The data form transmitted between the storage unit 10, the controller unit 11, and the arithmetic unit 12 may be raw data or encoded data.
The controller unit 11 is configured to obtain input data and a calculation instruction, and send a plurality of calculation instructions obtained by analyzing the calculation instruction and the input data to the calculation unit 12.
Specifically, the input data obtaining and instruction calculating modes may be obtained through a data input/output unit, and the data input/output unit may specifically be one or more data I/O interfaces or I/O pins.
The above calculation instructions include, but are not limited to: the present invention is not limited to the specific expression of the above-mentioned computation instruction, such as a convolution operation instruction, or a forward training instruction, or other neural network operation instruction.
Specifically, the controller unit 11 analyzes the acquired calculation instruction to obtain a plurality of operation instructions. Further, the controller unit 11 transmits the plurality of operation commands obtained by the analysis and the acquired input data to the operation unit 12.
In the processor, the decoding module is arranged in the operation unit to perform character encoding on the run-length encoded data and restore the run-length encoded data to original data to participate in operation, so that the operation correctness is ensured on the premise of data compression.
In one embodiment, the decoding module 3001 is further configured to identify the encoded data, obtain a character code and a preset character code, and expand the preset character code according to the data bit width to obtain a character code and a run-length threshold of the first preset number; and expanding the run-length threshold according to the data bit width to obtain a plurality of continuously arranged first preset numbers.
Specifically, in the first stage pipeline, the decoding module 3001 replaces the first predetermined character and/or the second predetermined character in the encoded data with the character encoding of the first predetermined number arranged in the front and the run threshold arranged behind the character encoding. Further, in the second stage pipeline, the decoding module 3001 replaces the run-length threshold value arranged after the character encoding of the first preset number with a plurality of the first preset numbers arranged in series.
In the processor, the decoding operation is executed by arranging the two stages of pipelines, so that the decoding efficiency is improved.
In one embodiment, the decoding module 3001 is further configured to identify the preset character code by determining whether the character code includes an additional character check bit if the encoded data includes a plurality of character codes with the same numerical value.
Specifically, the decoding module 3001 compares the data length of the character code with the set data bit width, and if the data length of the character code is equal to the set data bit width, it determines that no additional character check bit is set in the character code, and identifies the character code without the additional character check bit as the preset character code.
In one embodiment, as shown in fig. 5, a processor 4000 is provided, where the processor 4000 includes an arithmetic unit 12, a storage unit 10 disposed adjacent to the arithmetic unit 12, and a controller unit 11, and the controller unit 11 is connected between the arithmetic unit 12 and the storage unit 10. The arithmetic unit 12 includes a decoding module 4001, and the decoding module 4001 includes a control signal interface 4002, a buffer 4003, a plurality of registers 4004, and an output module 4005.
The control signal interface 4002 serves as an external input hardware interface for implementing connection between the decoding module 4001 and the controller unit 11 and data transmission.
The buffer 4003 is connected to a register 4004 and an output module 4005, which are disposed adjacent to the buffer, and is used for storing encoded data.
The registers 4004 are used for storing the execution result of the multistage pipeline.
Specifically, each stage of pipeline corresponds to one register 4004, and the register 4004 is used for storing an intermediate encoding result obtained after data encoding is performed on the stage of pipeline corresponding to the register 4004.
The output module 4005 is configured to store and output the decoded data.
Specifically, the output module 4005 may store data that has been decoded by the current code stream, and output the decoded data to the operation unit 12 to participate in forwarding and operation.
In the processor, the decoding module is improved, and the intermediate coding result in the execution process of each stage of pipeline can be stored by setting an intersegment register for each stage of pipeline; the decoding operation is executed in parallel through a multi-stage pipeline, thereby further improving the decoding efficiency.
In one embodiment, referring to fig. 6 to 9, the arithmetic unit 12 includes a master processing circuit 101 and at least one slave processing circuit 102, wherein the at least one slave processing circuit 102 is connected to the master processing circuit 101, the master processing circuit 101 is connected to the branch processing circuit(s) 103, and the branch processing circuit 103 is connected to the one or more slave processing circuits 102.
Wherein branch processing circuit 103 is configured to execute forwarding data or instructions between master processing circuit 101 and slave processing circuit 102. The main processing circuit 101 is used for data transmission of original data and encoded data with the storage unit 10. The master processing circuit 101 comprises an encoding module 1001, wherein the encoding module 1001 is configured to run-length encode the raw data by using a two-stage pipeline, and broadcast the encoded data to the slave processing circuit(s) 102 through the data I/O unit 203 for matrix multiplication. The slave processing circuit 1002 includes an encoding module 1001, and the encoding module 1001 is configured to perform run-length encoding on an operation result of the multiplication operation, and send encoded data to the master processing circuit 101 for accumulation and activation operation.
In one embodiment, with continued reference to fig. 6 to 9, the slave processing circuit 1002 further includes a decoding module 3001, where the decoding module 3001 is configured to decode the encoded data received from the processing circuit 102 by using a two-stage pipeline, and send the decoded data to the slave processing circuit 102 to participate in the matrix multiplication operation. The main processing circuit 101 further includes a decoding module 3001, where the decoding module 3001 is configured to decode the encoded operation result received by the main processing circuit 101 by using a two-stage pipeline, and send the decoded data to the main processing circuit 101 to participate in the accumulation and activation operations.
In one embodiment, the processor may further include a controller circuit 11, the controller circuit 11 including: instruction storage unit 110, instruction processing unit 111, and store queue unit 113.
The instruction storage unit 110 is configured to store a calculation instruction associated with an artificial neural network operation.
The instruction processing unit 111 is configured to analyze the calculation instruction to obtain a plurality of operation instructions.
A store queue unit 113 for storing an instruction queue, the instruction queue comprising: and a plurality of operation instructions and/or calculation instructions to be executed according to the front and back sequence of the queue.
As an optional implementation, the main processing circuit 101 may further include: one or any combination of the conversion processing circuit 1110, the activation processing circuit 1111, and the addition processing circuit 1112;
a conversion processing circuit 1110 for performing an interchange between the first data structure and the second data structure (e.g., conversion of continuous data and discrete data) on the data block or intermediate result received by the main processing circuit; or performing an interchange between the first data type and the second data type (e.g., a fixed point type to floating point type conversion) on a data block or intermediate result received by the main processing circuitry;
an activation processing circuit 1111 for performing an activation operation of data in the main processing circuit;
the addition processing circuit 1112 performs addition or accumulation.
The master processing circuit is configured to determine that the input neuron is broadcast data, determine that a weight is distribution data, distribute the distribution data into a plurality of data blocks, and send at least one data block of the plurality of data blocks and at least one operation instruction of the plurality of operation instructions to the slave processing circuit;
the plurality of slave processing circuits are used for executing operation on the received data blocks according to the operation instruction to obtain an intermediate result and transmitting the operation result to the master processing circuit;
and the main processing circuit is used for processing the intermediate results sent by the plurality of slave processing circuits to obtain the result of the calculation instruction and sending the result of the calculation instruction to the controller unit.
The slave processing circuit includes: a multiplication processing circuit;
the multiplication processing circuit is used for executing multiplication operation on the received data block to obtain a product result;
forwarding processing circuitry (optional) for forwarding the received data block or the product result.
And the accumulation processing circuit is used for performing accumulation operation on the product result to obtain the intermediate result.
In another embodiment, the operation instruction is a matrix by matrix instruction, an accumulation instruction, an activation instruction, or the like.
In one embodiment, after receiving the encoding instruction, the processor 1000 or the processor 2000 may execute the encoding method as shown in fig. 10, including the following steps:
at step 202, input data is obtained.
The input data refers to original data, i.e. data to be encoded. Specifically, the processor acquires data to be encoded from the storage unit and sends the acquired data to be encoded to the encoding module of the arithmetic unit.
Step 204, coding a first preset number in the input data according to the run bit width to obtain a run, and writing the run into a target code; wherein the run is used to represent the number of the first preset numbers arranged consecutively.
Wherein, the run bit width refers to the data length occupied by the run on the bus. The first predetermined number is generally used to refer to a number with a high frequency of occurrence in the input data, such as: in the sparse neural network, a large number of continuously arranged zero values appear in a sparse data model, and therefore, the zero values are used as first preset numbers for coding.
Specifically, the encoding module in the processor may replace the first preset number in the input data with a run, for example: in the sparse neural network, input data is {1,0,0,0,2,0,3,0,0}, a run bit width is set to be 2 bits, a first preset number is 0, and the input data is encoded according to a data bit width of 8 bits.
Figure BDA0001927460720000101
Wherein three 0 s arranged consecutively in the input data are represented by a run 11; a 0 in the input data is represented by a run 01; two 0 s arranged consecutively in the input data are denoted by a run 10.
In the coding method, the first preset digit in the data to be coded is run-length coded, and the run-length represents the number of the first preset digits which are continuously arranged, so that a large number of continuously appeared first preset digits can be subjected to data compression, and bandwidth resources are saved.
As an optional implementation, the encoding method shown in fig. 10 further includes the following steps:
step 302, counting the input data to obtain the frequency of occurrence of the first preset number.
For example: in the sparse neural network, it is found by statistics that 80% of 0 s are continuous 3 or less than 3, that is, it is rare that more than 30 s are continuous.
And 304, setting the run bit width according to the frequency of the first preset number.
Specifically, the run bit width is set according to the occurrence frequency of the first preset number 0 counted in step 302. Preferably, the run bit width may be set to 2 bits.
In one embodiment, step 204 specifically includes the following steps:
step 2042, splitting the input data into a second preset number and the first preset number.
Wherein the second preset number includes other numbers except the first preset number, and is generally used to refer to other numbers except the number with higher frequency of occurrence in the input data, for example: in the sparse neural network, a zero value in the sparse data model is generally used as a first preset number, and a non-zero value in the sparse data model is used as a second preset number.
And 2044, obtaining a run threshold according to the run bit width.
Wherein, the run threshold refers to the number of the first preset numbers which are arranged in succession and the run can be represented at most. Specifically, the encoding module in the processor may obtain the run threshold according to the set run bit width, for example: setting the run bit width to be 2 bits, the run can represent at most three first preset numbers which are continuously arranged, namely, the run threshold is 3.
Step 2046, if the number of the first preset numbers arranged consecutively after the second preset number is less than or equal to the run threshold, encoding the plurality of first preset numbers arranged consecutively after the second preset number to obtain the run.
Specifically, a coding module in the processor obtains the number of first preset numbers which are continuously arranged after a second preset number, compares the obtained number of first preset numbers which are continuously arranged after the second preset number with a run threshold, judges whether the number of first preset numbers which are continuously arranged after the second preset number is smaller than or equal to the run threshold, and replaces a plurality of first preset numbers which are continuously arranged after the second preset number with runs if the number of first preset numbers which are continuously arranged after the second preset number is smaller than or equal to the run threshold, thereby realizing data compression of the plurality of first preset numbers which are continuously arranged after the second preset number. The encoding module firstly acquires that three first preset numbers 0 are continuously arranged behind a second preset number 1, and the run with the run bit width of 2 bits can represent the three first preset numbers which are continuously arranged at most, namely the run threshold value is 3, so that the run 11 can be used for replacing the three first preset numbers 0 which are continuously arranged behind the second preset number 1.
In one embodiment, another encoding method is provided, comprising the steps of:
step 402, if the number of the first preset digits arranged continuously is greater than the run threshold, character encoding is performed on the first preset digit arranged at the top after being greater than the run threshold according to the data bit width of the first preset digit.
The data bit width of the first preset number can be set in advance according to the requirement of the processor. Optionally, the first preset number may be set to be a data bit width of 8 bits, or may be set to be a data bit width of 16 bits, which is not specifically limited herein.
Specifically, an encoding module in the processor acquires the number of first preset numbers which are continuously arranged, compares the acquired number of the first preset numbers which are continuously arranged with a run threshold, judges whether the number of the first preset numbers which are continuously arranged is larger than the run threshold, stops run encoding if the number of the first preset numbers which are continuously arranged is larger than the run threshold, treats the first preset numbers which are arranged after being larger than the run threshold as second preset numbers, and performs character encoding according to a preset data bit width of the first preset numbers.
Step 404, according to the run bit width, encoding other first preset numbers arranged after the first preset number at the head after being larger than the run threshold to obtain the run.
Specifically, other first preset numbers behind the first preset number arranged after the running threshold value are replaced by the running, so that data compression of a plurality of first preset numbers behind the first preset number arranged after the running threshold value is realized.
For example, in the sparse neural network, the input data is {1,0,0,0,0,2,0,3,0,0}, the run bit width is set to be 2 bits, the first preset number is 0, the other nonzero values 1, 2 and 3 are second preset numbers, and the data bit widths of the first preset number and the second preset number are 8 bits, and the input data is encoded.
Figure BDA0001927460720000121
The encoding module firstly acquires three first preset numbers which are continuously arranged, namely, a run threshold value is 3, wherein the run of which the run bit width is 2 bits can at most represent the three continuously arranged first preset numbers, and then the first preset number 0 arranged at the head is encoded according to the data bit width character of 8 bits, and the three first preset numbers 0 continuously arranged thereafter are encoded, namely, the run 11 is used for replacing the three first preset numbers 0 continuously arranged thereafter.
In one embodiment, another encoding method is provided, including: and if the first digit of the input data is the first preset digit, performing character encoding on the first preset digit according to the data bit width of the first preset digit.
Specifically, if the data arranged at the head in the input data is a first preset number, the first preset number cannot be replaced by a run according to the coding rule of number + run, and the first preset number arranged at the head should be regarded as a second preset number, and character coding is performed according to the preset data bit width of the first preset number.
For example, in the sparse neural network, the input data is {0,1,0,0,2,0,3,0,0}, the run bit width is set to be 2 bits, the first preset number is 0, the other nonzero values 1, 2 and 3 are second preset numbers, and the data bit widths of the first preset number and the second preset number are both 8 bits, and the input data is encoded.
Figure BDA0001927460720000122
The encoding module encodes the first preset number 0 arranged at the head according to the data bit width character of 8 bits, and encodes other first preset numbers and second preset numbers according to the encoding method shown in fig. 10.
As an optional implementation manner, if a first preset number is also arranged after a first preset number arranged at the head, according to the run bit width, encoding other first preset numbers after the first preset number arranged at the head in the input data, so as to obtain the run.
Specifically, the run is used to replace other first preset numbers after the first preset number of the first arrangement, so as to realize data compression on other first preset numbers after the first preset number of the first arrangement.
For example, in the sparse neural network, the input data is {0,0,0,0,2,0,3,0,0}, the run bit width is set to be 2 bits, the first predetermined number is 0, the other nonzero values 1, 2, and 3 are all second predetermined numbers, and the data bit widths of the first predetermined number and the second predetermined number are both 8 bits, and the input data is encoded.
Figure BDA0001927460720000123
The encoding module encodes the first preset number 0 arranged at the head according to the data bit width character of 8 bits, encodes the three first preset numbers 0 arranged consecutively thereafter, that is, the run 11 replaces the three first preset numbers 0 arranged consecutively thereafter, and encodes the other first preset numbers and the second preset numbers according to the encoding method shown in fig. 8.
In the coding method, the run-length coding is carried out on the first preset numbers arranged in different forms according to the run-length bit width, so that data compression under various conditions is realized, and the diversity and compatibility of data coding are realized.
In one embodiment, another encoding method is provided, including: and replacing the first preset number after character encoding and the run length after the first preset number after character encoding by using a first preset character.
And selecting data with low frequency of occurrence to perform character encoding according to the data bit width of the first preset character, and taking the character after the character encoding as the first preset character. For example: selecting data 64 with low frequency of occurrence to perform character encoding according to the data bit width 8bit of the first preset character to obtain 01000000, and taking 01000000 as the first preset character.
As an alternative implementation manner, when the number of the first preset digits arranged in succession is greater than the run threshold and the number of the first preset digits arranged in succession after the first preset digit arranged in first order reaches the run threshold, in the second-stage pipeline, the first preset digit after the character coding arranged in first order and the run replacing a plurality of first preset digits arranged in succession after the first preset digit coded in first order are replaced by the first preset character, so as to further realize data compression.
In one embodiment, the encoding method further comprises the steps of:
step 502, obtaining a first preset character code, wherein the first preset character code is obtained by coding the first preset character in a configuration module.
Specifically, data with a low frequency of occurrence is selected as a first preset character (zero code), and a configuration module in the encoding module encodes the first preset character (zero code) in advance to obtain a first preset character code (zero code).
Step 504, replacing the first preset number after character encoding and the run length after the first preset number after character encoding by using the first preset character encoding.
Specifically, the first preset character code (zero code) in step 502 is substituted for the first preset number after the character encoding and the run length after the first preset number after the character encoding.
By way of example only, it is possible to illustrate,
Figure BDA0001927460720000131
in the second-stage pipeline, data 64 with a low occurrence frequency is selected as a first preset character (zero literal), the first preset character 64 is coded in advance to obtain a first preset character code 01000000, and the first preset character code 01000000 is used for replacing a first preset number 00000000 obtained after the first arranged character is coded and a run 11 for representing three first preset number 0 which continuously appear behind the first preset character code 01000000.
In the coding method, the coding operation is executed by setting a two-stage production line, so that data compression can be further realized, and the efficiency of run-length coding is improved; meanwhile, the first preset character is coded in advance, so that the process of repeated coding is omitted.
In one embodiment, another encoding method is provided, including: and setting an additional character check bit for the character code with the same numerical value as the first preset character.
The encoding module may add an additional character check bit to the character encoding having the same value as the first preset character value, or may add a plurality of additional character check bits, which is not specifically limited herein. In view of saving bandwidth, it is preferable to add an additional character check bit and set the additional character check bit at the last bit of the character string. The character of the appended character check bit can be set to be 0 or 1, and the character code is output to the operation unit together with the appended character to participate in the operation.
For example, the data 64 with a low frequency of occurrence is selected as a first preset character (zero literal), and the encoding module encodes the preset character 64 in advance to obtain a first preset character code (zero code) 01000000. If a character code with a numerical value of 64 appears, in order to distinguish the character code from the first preset character, the encoding module adds an additional character check bit to the character code 01000000, namely 1bit is added, and the added additional character check bit is set to 0, so that the output data is 010000000, and the length of the output data is 9 bits.
In the coding method, the additional character check bit is set for the character code with the same numerical value as the first preset character, so that the first preset character and the character code with the same numerical value as the first preset character can be distinguished, and the problem of definition conflict of the first preset character is solved.
In one embodiment, another encoding method is provided, including: replacing the first preset character with a second preset character; and if the target code has the character code with the same value as the second preset character value, setting an additional character check bit for the character code with the same value as the second preset character value.
Specifically, data with less frequency of occurrence is selected as a second preset character, and the coding module replaces the first preset character with the second preset character. An additional character check digit may be added to the character code having the same numerical value as the second preset character, or multiple additional character check digits may be added, which is not limited herein. In view of saving bandwidth, it is preferable to add an additional character check bit and set the additional character check bit at the last bit of the character string.
For example, if a character code with a value of 64 occurs, in order to distinguish the character code from the first predetermined character, the data 128 with less occurrence frequency is selected as the second predetermined character (zero extra), and the encoding module replaces the character code with the value of 64 with the second predetermined character 128. Further, if a character code with a value of 128 occurs, in order to distinguish the character code from a second preset character, the encoding module adds an additional character check bit to the character code 10000000 with a value of 128, that is, 1bit, and sets the added additional character check bit to 1, so that the output data is 100000001, and the output data length is 9 bits.
In the coding method, the first preset character is replaced by the second preset character, and the additional character check bit is set for the character code with the same numerical value as the second preset character, so that the second preset character and the character code with the same numerical value as the second preset character can be distinguished, and the problem of definition conflict of the second preset character is solved; meanwhile, the additional burden of the processor can be further reduced by selecting the second preset characters with less occurrence frequency for distinguishing.
In one embodiment, after receiving the decoding instruction, the processor 3000 or the processor 4000 may execute the decoding method as shown in fig. 11, including the following steps:
at step 602, encoded data is obtained.
Specifically, the processor obtains data subjected to run-length coding by the coding module.
And step 604, identifying the coded data according to the data bit width and the run bit width to obtain character codes and runs.
Wherein, the run bit width refers to the data length occupied by the run on the bus. The data bit width can be set in advance by the processor according to the requirement. Optionally, a data bit width of 8 bits may be set, and a data bit width of 16 bits may also be set, which is not specifically limited herein. Character encoding refers to another form of data, such as: the decimal data may be encoded to obtain a binary character code. A run is used to indicate the number of first preset digits in the encoded data.
Specifically, the processor acquires the data length of the encoded data and compares the acquired data length of the encoded data with the set data bit width and the run bit width, thereby identifying the character code and the run from the encoded data. For example: setting the bit width of data to be 8 bits, setting the run bit width to be 2 bits, and if the data length of the obtained coded data is 8 bits, identifying the coded data as character codes; if the data length of the obtained coded data is 2 bits, the coded data is identified as run.
And 606, performing character coding on the run according to the data bit width to obtain a first preset number.
Wherein, the first preset number is generally used to refer to a number with a higher frequency of occurrence in the input data, such as: in the sparse neural network, a large number of zero values which are continuously arranged appear in the sparse data model, and therefore, the zero values are used as first preset numbers.
Specifically, a decoding module in the processor performs character encoding on the runs in the encoded numbers according to a set data bit width to obtain one or more first preset numbers which are continuously arranged.
In the decoding method, the run-length coded data is subjected to character coding and restored to be original data to participate in operation, so that the operation correctness is ensured on the premise of data compression.
In one embodiment, step 604 specifically includes the following steps:
step 6042, obtain the data length of the character string in the encoded data.
The decoding device acquires the data length of each character string in the coded data, and identifies the character string according to the data length of each character string.
Step 6044, if the data length of the character string is equal to the data bit width, identifying the character string as a character code; and if the data length of the character string is equal to the run bit width, identifying the character string as run.
Specifically, the decoding apparatus determines whether the data length of the character string is equal to the data bit width according to the data bit width and the run bit width set in step 604, and identifies the character string as a character code if the data length of the character string is equal to the data bit width. And further, judging whether the data length of the character string is equal to the run bit width, and if the data length of the character string is equal to the run bit width, identifying the character string as a run.
In one embodiment, step 606 specifically includes:
and carrying out character coding on the run according to the data bit width to obtain the first preset number.
Or carrying out character coding on the run according to the data bit width to obtain a plurality of continuously arranged first preset numbers.
Wherein, in the plurality of the first preset numbers arranged in succession, the number of the first preset numbers is the same as the number of the first preset numbers represented by the run length.
By way of example only, it is possible to illustrate,
Figure BDA0001927460720000161
setting the data bit width to be 8bit and the first preset number to be 0, and encoding run 11 characters in the encoded data into three continuously arranged first preset numbers 0 through a step 6044; encoding the run 01 character in the encoded data as a first predetermined number 0, via step 6042; the run 10 characters in the encoded data are encoded into two consecutive first preset numbers 0, via step 6044.
In one embodiment, after receiving the decoding instruction, the processor 3000 or the processor 4000 may execute the decoding method as shown in fig. 12, including the following steps:
at step 702, encoded data is obtained.
Specifically, the processor obtains data subjected to run-length coding by the coding module.
Step 704, recognizing the coded data to obtain a character code and a preset character code, wherein the preset character code comprises a first preset character code and a second preset character code.
The character code, i.e. the preset character code, refers to another form of data, for example: the decimal data may be encoded to obtain a binary character encoding. The first preset character and the second preset character refer to data with less frequency of occurrence, for example: the data 64 may be used as a first predetermined character; the data 128 may be used as a second predetermined character.
As an optional implementation manner, if the encoded data includes a plurality of character codes with the same numerical value, the preset character code is identified by judging whether the character code includes an additional character check bit.
Step 706, according to the data bit width, the preset character code is expanded to obtain a character code of a first preset number and a run threshold, wherein the run threshold is arranged after the character code of the first preset number.
The processor can set the data bit width in advance according to the requirement. Optionally, a data bit width of 8 bits may be set, and a data bit width of 16 bits may also be set, which is not specifically limited herein. The first predetermined number is generally used to refer to a number with a high frequency of occurrence in the input data, such as: in the sparse neural network, a large number of zero values which are continuously arranged appear in the sparse data model, and therefore, the zero values are used as first preset numbers. The run threshold refers to the number of the first preset number of consecutive permutations that the run can represent at most.
Specifically, in the first stage pipeline, the decoding device in the processor replaces the first preset character and/or the second preset character in the coded data with the character code of the first preset number arranged in the front and the run threshold arranged behind the character code.
Step 708, spreading the run threshold according to the data bit width to obtain a plurality of continuously arranged first preset numbers, wherein the number of the first preset numbers is the same as the number of the first preset numbers represented by the run threshold.
The processor can set the data bit width in advance according to the requirement. Optionally, a data bit width of 8 bits may be set, and a data bit width of 16 bits may also be set, which is not specifically limited herein. A run is used to indicate the number of first preset digits in the encoded data.
In particular, in the second stage pipeline, the decoding means in the processor replaces the run threshold arranged after the first preset number with a plurality of consecutively arranged first preset numbers.
By way of example only, it is possible to illustrate,
Figure BDA0001927460720000171
setting the data bit width to be 8bit and the first preset number to be 0, and performing character encoding on a first preset character 01000000 in encoded data in a first-stage production line to obtain a first preset number 0 and a run threshold 11 arranged behind the first preset number 0; in a second-stage production line, encoding the run threshold 11 characters into three first preset numbers 0 which are arranged in series; encoding the run 01 character in the encoded data into a first preset number 0; run 10 characters in the encoded data are encoded as two first preset numbers 0 arranged in series.
In the decoding method, the two stages of pipelines are arranged, so that the decoding efficiency can be improved.
In one embodiment, step 704 specifically includes the following steps:
step 7042, obtain the value of the character string in the encoded data.
The decoding device acquires the data value of each character string in the coded data, and identifies the character string according to the data value of each character string.
Step 7044, if the value of the character string is different from the value of a preset character used in encoding, identifying the character string as a character code; and if the value of the character string is the same as the value of a preset character used in encoding, identifying the character string as a preset character code.
Specifically, the decoding apparatus determines whether the value of the character string is equal to the value of the preset character used in encoding according to the value of the preset character set in the encoding method shown in fig. 10, and identifies the character string as a character code if the value of the character string is not equal to the value of the preset character used in encoding; and if the value of the character string is equal to the value of a preset character used in encoding, identifying the character string as a preset character code.
In one embodiment, step 704 further comprises: and if the coded data comprises a plurality of character codes with the same numerical value, identifying the preset character code by judging whether the character codes comprise additional character check bits.
As an optional implementation manner, the method specifically includes the following steps:
step 7042a, obtaining the data length of the character code.
Specifically, the decoding device in the processor acquires the data length of one character code of a plurality of character codes with the same numerical value.
Step 7044b, compare the data length of the character code with the data bit width.
Specifically, the decoding device in the processor compares the data length of the character code acquired in step 7042 with the set data bit width, and determines whether the data length of the character code acquired in step 7042 is equal to the set data bit width.
Step 7046c, if the data length of the character code is equal to the data bit width, it is determined that the appended character check bit is not set in the character code.
For example: if the set data bit width is 8 bits, the data length of the acquired character code is 8 bits, and the data length of the acquired character code is equal to the set data bit width, judging that no additional character check bit is set in the acquired character code; and if the data length of the acquired character code is 9 bits and the data length of the acquired character code is greater than the set data bit width, judging that the acquired character code is provided with 1-bit additional character check bits.
And 7048d, recognizing the character code without the additional character check bit as the preset character code.
Specifically, if it is determined in step 7046c that the additional character check bit is not set in the acquired character code, the character code is identified as the preset character code.
In one embodiment, the decoding method is applied to a processor as shown in fig. 6-9, and comprises the following steps: and forwarding and operating the decoded data in an operation unit, wherein the operation comprises multiplication operation, accumulation operation and activation operation.
As an optional implementation manner, the method specifically includes the following steps:
step 802, if the decoded data includes the character code with the appended character check bits, deleting the appended character check bits.
The character code may include one additional character check digit, or may include a plurality of additional character check digits, which is not limited herein. In view of saving bandwidth, it is preferable to set an additional character check bit and set the additional character check bit at the last bit of the character code. The character of the appended character check bits may be set to 0 or 1.
It should be noted that the appended character check bits are only used to distinguish the real data from the preset characters having the same value as the real data, and are not used as valid data, so that the appended character check bits need to be deleted before the decoded data participates in the operation.
And step 804, forwarding and operating the character code with the additional character check bits deleted in the operation unit.
Specifically, the master processing circuit in the processor may send the encoded data to the slave processing circuit, the slave processing circuit decodes the received encoded data, and performs a multiplication operation on the decoded data to obtain a plurality of intermediate operation results. Further, the slave processing circuit encodes a plurality of intermediate operation results obtained by multiplication respectively and sends the encoded intermediate operation results to the master processing circuit, and the master processing circuit decodes the received encoded data and performs accumulation and activation operation on the decoded data to obtain a plurality of intermediate operation results. Further, the main processing circuit respectively encodes a plurality of intermediate operation results obtained by accumulation and activation operation, and sends the encoded intermediate operation results to the storage unit for storage.
It should be understood that although the various steps in the flow charts of fig. 10-12 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Also, at least some of the steps in fig. 10-12 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 13, there is provided an encoding apparatus including: an input data obtaining module 901 and a run length obtaining module 902, wherein:
an input data acquiring module 901, configured to acquire input data;
a run length obtaining module 902, configured to encode a first preset number in the input data according to a run length bit width to obtain a run length, and write the run length into a target code; wherein the run is used to represent the number of the first preset number.
In one embodiment, as shown in fig. 14, there is provided a decoding apparatus including: the encoding data obtaining module 1001, the encoding data identifying module 1002, and the first preset number obtaining module 1003, wherein:
an encoded data acquisition module 1001 configured to acquire encoded data;
the coded data identification module 1002 is configured to identify the coded data according to the data bit width and the run bit width, so as to obtain a character code and a run.
A first preset number obtaining module 1003, configured to spread the run according to the data bit width to obtain a first preset number.
In one embodiment, as shown in fig. 15, there is provided a decoding apparatus including: the encoding system comprises an encoding data acquisition module 1101, an encoding data recognition module 1102, a preset character encoding and unfolding module 1103 and a run threshold unfolding module 1104, wherein:
an encoded data acquisition module 1101 configured to acquire encoded data;
the coded data identification module 1102 is configured to identify the coded data to obtain a character code and a preset character code, where the preset character code includes a first preset character code and a second preset character code;
a preset character code unfolding module 1103, configured to unfold the preset character code according to a data bit width to obtain a character code of a first preset number and a run threshold, where the run threshold is arranged after the character code of the first preset number;
and a run threshold expansion module 1104, configured to expand the run threshold according to the data bit width to obtain a plurality of character codes of the first preset numbers that are arranged in series, where the number of the first preset numbers is the same as the number of the first preset numbers represented by the run threshold.
For the specific limitation of the operation device, reference may be made to the above limitation on the operation method, which is not described herein again. The modules in the computing device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring input data;
coding a first preset number in the input data according to the run bit width to obtain a run, and writing the run into a target code; wherein the run is used to represent the number of the first preset numbers arranged consecutively.
In one embodiment, another computer-readable storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of:
acquiring coded data;
according to the data bit width and the run bit width, identifying the coded data to obtain character codes and runs;
and unfolding the run according to the data bit width to obtain a first preset number.
In one embodiment, another computer-readable storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of:
acquiring coded data;
identifying the coded data to obtain a character code and a preset character code, wherein the preset character code comprises a first preset character code and a second preset character code;
according to the data bit width, unfolding the preset character codes to obtain a character code of a first preset number and a run threshold, wherein the run threshold is arranged behind the character code of the first preset number;
and unfolding the run threshold according to the data bit width to obtain a plurality of continuously arranged first preset numbers, wherein the number of the first preset numbers is the same as that of the first preset numbers represented by the run threshold.
It should be clear that, steps implemented when the computer program in the embodiment of the present application is executed by the processor are consistent with the execution process of each step of the method in the foregoing embodiment, and may specifically refer to the foregoing description, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (16)

1. A method of decoding, the method comprising:
acquiring coded data;
acquiring the data length of a character string in the coded data;
if the data length of the character string is equal to the data bit width, identifying the character string as a character code; if the data length of the character string is equal to the run bit width, identifying the character string as a run;
and unfolding the run according to the data bit width to obtain a first preset number.
2. The method of claim 1, wherein the unrolling the run according to the data bit width to obtain a first predetermined number comprises:
expanding the run according to the data bit width to obtain a first preset number; or
Expanding the run according to the data bit width to obtain a plurality of continuously arranged first preset numbers;
wherein, in the plurality of the first preset numbers arranged in succession, the number of the first preset numbers is the same as the number of the first preset numbers represented by the run length.
3. The method of claim 1, further comprising: the first predetermined number is a zero value.
4. A method of decoding, the method comprising:
acquiring coded data;
identifying the coded data to obtain a character code and a preset character code, wherein the preset character code comprises a first preset character code and a second preset character code;
according to the data bit width, unfolding the preset character codes to obtain a character code of a first preset number and a run threshold, wherein the run threshold is arranged behind the character code of the first preset number;
the run threshold is expanded according to the data bit width to obtain a plurality of first preset numbers which are continuously arranged, wherein the number of the first preset numbers is the same as the number of the first preset numbers represented by the run threshold;
wherein, the identifying the coded data to obtain the character code and the preset character code comprises:
acquiring the value of a character string in the coded data;
if the value of the character string is different from the value of a preset character used in encoding, identifying the character string as a character code;
and if the value of the character string is the same as the value of a preset character used in encoding, identifying the character string as a preset character code.
5. The method according to claim 4, wherein the recognizing the encoded data to obtain a character code and a preset character code comprises:
and if the coded data comprises a plurality of character codes with the same numerical value, identifying the preset character code by judging whether the character codes comprise additional character check bits.
6. The method of claim 5, wherein if the encoded data includes a plurality of character codes with the same numerical value, identifying the predetermined character code by determining whether the character code includes an additional character check bit comprises:
acquiring the data length of the character code;
comparing the data length of the character code with the data bit width;
if the data length of the character code is equal to the data bit width, judging that the additional character check bit is not set in the character code;
and identifying the character code without the additional character check bit as the preset character code.
7. The method according to any one of claims 1-6, further comprising:
and forwarding and operating the decoded data in an operation unit, wherein the operation comprises multiplication operation, accumulation operation and activation operation.
8. The method of claim 5 or 6, further comprising:
if the decoded data comprises the character code provided with the additional character check bit, deleting the additional character check bit;
and forwarding and operating the character code with the additional character check bit deleted in an operation unit.
9. A processor is characterized by comprising an arithmetic unit, a storage unit and a controller unit, wherein the storage unit is arranged adjacent to the arithmetic unit;
the arithmetic unit comprises a decoding module, wherein the decoding module is used for acquiring encoded data, acquiring the data length of a character string in the encoded data, identifying the character string as a character code if the data length of the character string is equal to the data bit width, identifying the character string as a run if the data length of the character string is equal to the run bit width, and unfolding the run according to the data bit width to obtain a first preset number;
the storage unit is used for storing original data and coded data and carrying out data transmission with the controller unit and the arithmetic unit;
the controller unit is used for acquiring input data and a calculation instruction, and sending a plurality of calculation instructions obtained by analyzing the calculation instruction and the input data to the calculation unit.
10. The processor of claim 9,
the decoding module is further configured to obtain a value of a character string in the encoded data, identify the character string as a character code if the value of the character string is different from a value of a preset character used in encoding, identify the character string as the preset character code if the value of the character string is the same as the value of the preset character used in encoding, and expand the preset character code according to the data bit width to obtain a character code of the first preset number and a run threshold; and expanding the run-length threshold according to the data bit width to obtain a plurality of continuously arranged first preset numbers.
11. The processor of claim 10,
the decoding module is further configured to identify the preset character code by judging whether the character code includes an additional character check bit if the encoded data includes a plurality of character codes having the same numerical value.
12. The processor of claim 10, wherein the decode module comprises a control signal interface, a buffer, a plurality of registers, and an output module;
the control signal interface is used for realizing connection between the decoding module and the controller unit and data transmission;
the cache is connected with a register arranged adjacent to the cache and used for storing the encoded data;
the plurality of registers are used for storing the execution result of the multistage pipeline;
the output module is used for storing and outputting the decoded data.
13. The processor according to any of claims 9-12, wherein the arithmetic circuitry comprises a master processing circuit and at least one slave processing circuit, each of the at least one slave processing circuit being connected to the master processing circuit;
the decoding module is arranged in the main processing circuit and each slave processing circuit.
14. An apparatus for decoding, the apparatus comprising:
the coded data acquisition module is used for acquiring coded data;
the coded data identification module is used for acquiring the data length of a character string in the coded data, identifying the character string as a character code if the data length of the character string is equal to the data bit width, and identifying the character string as a run if the data length of the character string is equal to the run bit width;
and the first preset digit acquisition module is used for unfolding the run according to the data bit width to obtain a first preset digit.
15. An apparatus for decoding, the apparatus comprising:
the coded data acquisition module is used for acquiring coded data;
the coded data identification module is used for identifying the coded data to obtain a character code and a preset character code, wherein the preset character code comprises a first preset character code and a second preset character code;
the device comprises a preset character code unfolding module, a run threshold value and a data processing module, wherein the preset character code unfolding module is used for unfolding the preset character code according to the data bit width to obtain a character code of a first preset number and the run threshold value, and the run threshold value is arranged behind the character code of the first preset number;
the run threshold value unfolding module is used for unfolding the run threshold value according to the data bit width to obtain a plurality of continuously arranged first preset numbers, wherein the number of the first preset numbers is the same as that of the first preset numbers represented by the run threshold value;
the encoding data identification module is specifically configured to acquire a value of a character string in the encoding data, identify the character string as a character code if the value of the character string is different from a value of a preset character used in encoding, and identify the character string as the preset character code if the value of the character string is the same as the value of the preset character used in encoding.
16. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN201811623531.0A 2018-12-07 2018-12-28 Decoding method, processor, decoding device and storage medium Active CN111384960B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811623531.0A CN111384960B (en) 2018-12-28 2018-12-28 Decoding method, processor, decoding device and storage medium
PCT/CN2019/121056 WO2020114283A1 (en) 2018-12-07 2019-11-26 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811623531.0A CN111384960B (en) 2018-12-28 2018-12-28 Decoding method, processor, decoding device and storage medium

Publications (2)

Publication Number Publication Date
CN111384960A CN111384960A (en) 2020-07-07
CN111384960B true CN111384960B (en) 2022-05-10

Family

ID=71222468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811623531.0A Active CN111384960B (en) 2018-12-07 2018-12-28 Decoding method, processor, decoding device and storage medium

Country Status (1)

Country Link
CN (1) CN111384960B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697654B (en) * 2020-12-30 2023-06-30 中国科学院计算技术研究所 Neural network quantization compression method and system
CN113163198B (en) * 2021-03-19 2022-12-06 北京百度网讯科技有限公司 Image compression method, decompression method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4127984A1 (en) * 1991-08-23 1993-02-25 Broadcast Television Syst SYNCHRONIZATION METHOD FOR A RUNNING LIMIT LIMITED (1,7) CODE AND CIRCUIT ARRANGEMENT HEREFUER
JPH07236064A (en) * 1994-02-24 1995-09-05 Canon Inc Run length coding/decoding device
KR19980076042A (en) * 1997-04-04 1998-11-16 윤종용 Run-level symbol decoding method and apparatus
JP2003303468A (en) * 2002-04-08 2003-10-24 Sony Disc Technology Inc Data recording medium and data recording method and device
KR20070049549A (en) * 2005-11-08 2007-05-11 엘지전자 주식회사 Apparatus and method for encoding and decoding multi-channel audio
CN101604974A (en) * 2009-04-21 2009-12-16 陈向前 A kind of test data compression coding, coding/decoding method and special decoding unit with same run length
JP2012249061A (en) * 2011-05-27 2012-12-13 Konica Minolta Advanced Layers Inc Run length coding device, run length decoding device, run length coding method, and run length decoding method
CN103746706A (en) * 2014-01-01 2014-04-23 安庆师范学院 Testing data compressing and decompressing method on basis of double-run-length alternate coding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4127984A1 (en) * 1991-08-23 1993-02-25 Broadcast Television Syst SYNCHRONIZATION METHOD FOR A RUNNING LIMIT LIMITED (1,7) CODE AND CIRCUIT ARRANGEMENT HEREFUER
JPH07236064A (en) * 1994-02-24 1995-09-05 Canon Inc Run length coding/decoding device
KR19980076042A (en) * 1997-04-04 1998-11-16 윤종용 Run-level symbol decoding method and apparatus
JP2003303468A (en) * 2002-04-08 2003-10-24 Sony Disc Technology Inc Data recording medium and data recording method and device
KR20070049549A (en) * 2005-11-08 2007-05-11 엘지전자 주식회사 Apparatus and method for encoding and decoding multi-channel audio
CN101604974A (en) * 2009-04-21 2009-12-16 陈向前 A kind of test data compression coding, coding/decoding method and special decoding unit with same run length
JP2012249061A (en) * 2011-05-27 2012-12-13 Konica Minolta Advanced Layers Inc Run length coding device, run length decoding device, run length coding method, and run length decoding method
CN103746706A (en) * 2014-01-01 2014-04-23 安庆师范学院 Testing data compressing and decompressing method on basis of double-run-length alternate coding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AVS游程解码、反扫描、反量化和反变换优化设计;赵策;《信息技术》;20070228;第2007年卷(第2期);54-57 *
游程长度受限码 RLL(1,10)的研究及其实现;刘丹;《微处理机》;20081231;第2008年卷(第6期);183-185 *

Also Published As

Publication number Publication date
CN111384960A (en) 2020-07-07

Similar Documents

Publication Publication Date Title
CN111384959B (en) Encoding method, processor, encoding module, and storage medium
CN101989443B (en) For the multi-mode encoding of data compression
CN100553152C (en) Coding method and equipment and coding/decoding method and equipment based on CABAC
CN111384960B (en) Decoding method, processor, decoding device and storage medium
CN111384969B (en) Encoding method, processor, encoding device, and storage medium
CN116681036B (en) Industrial data storage method based on digital twinning
CN110784225A (en) Data compression method, data decompression method, related device, electronic equipment and system
CN108615076B (en) Deep learning chip-based data storage optimization method and device
CN116016606B (en) Sewage treatment operation and maintenance data efficient management system based on intelligent cloud
CN108886367A (en) Method, apparatus and system for compression and decompression data
CN114048711A (en) Text compression method, text decompression method, text compression device, text decompression device, computer equipment and storage medium
US20060018556A1 (en) Method, apparatus and system for data block rearrangement for LZ data compression
CN113300715A (en) Data processing method, device, hardware compression equipment and medium
CN111382849B (en) Data compression method, processor, data compression device and storage medium
US6748520B1 (en) System and method for compressing and decompressing a binary code image
US20170351461A1 (en) Non-transitory computer-readable storage medium, and data compressing device
US20230222354A1 (en) A method for a distributed learning
CN108880559B (en) Data compression method, data decompression method, compression equipment and decompression equipment
CN106788447B (en) Matching length output method and device for L Z77 compression algorithm
CN109698703B (en) Gene sequencing data decompression method, system and computer readable medium
US11443456B2 (en) Data compression method and device
US10931303B1 (en) Data processing system
CN108829930A (en) The light weight method of three-dimensional digital technological design MBD model
CN114337682A (en) Huffman coding and compressing device
CN114978194A (en) Structure optimization method and device of original pattern LDPC code suitable for lossy source coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant