CN112927125A - Data processing method and device, computer equipment and storage medium - Google Patents

Data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112927125A
CN112927125A CN202110132573.XA CN202110132573A CN112927125A CN 112927125 A CN112927125 A CN 112927125A CN 202110132573 A CN202110132573 A CN 202110132573A CN 112927125 A CN112927125 A CN 112927125A
Authority
CN
China
Prior art keywords
multiplier
adder
data processing
group
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110132573.XA
Other languages
Chinese (zh)
Other versions
CN112927125B (en
Inventor
周军
周亮
常亮
王文强
吴飞
徐宁仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Original Assignee
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, Chengdu Sensetime Technology Co Ltd filed Critical University of Electronic Science and Technology of China
Priority to CN202110132573.XA priority Critical patent/CN112927125B/en
Publication of CN112927125A publication Critical patent/CN112927125A/en
Priority to PCT/CN2021/115799 priority patent/WO2022160706A1/en
Application granted granted Critical
Publication of CN112927125B publication Critical patent/CN112927125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The present disclosure provides a data processing method, apparatus, computer device, and storage medium, wherein the method comprises: grouping a plurality of multiplier-adders in a multiplier-adder array based on a matrix operand operation step size to obtain at least one multiplier-adder group; and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group. The method and the device can enable the multiplier-adder array to process a plurality of data processing tasks simultaneously, and improve the processing efficiency of the multiplier-adder array on the data processing tasks. In addition, the multiplier-adder arrays are grouped based on the operand operation step length, so that the multiplier-adder which originally has invalid processing results of a certain data processing task is valid for the processing results of another data processing task, the utilization rate of the multiplier-adder arrays is improved, and the waste of computing resources is reduced.

Description

Data processing method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.
Background
At present, a convolutional neural network mainly depends on a multiplier-adder array to carry out convolution processing, the multiplier-adder array stores image data to be processed in a data processing task in a corresponding register array, and the image data to be processed moves in the register array in different data processing periods; however, the current data processing mode has the problems of low utilization rate of the multiplier-adder array and waste of computing resources.
Disclosure of Invention
The embodiment of the disclosure at least provides a data processing method, a data processing device, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including:
grouping a plurality of multiplier-adders in a multiplier-adder array based on a matrix operand operation step size to obtain at least one multiplier-adder group; and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
Therefore, based on grouping the multiplier-adder arrays, the multiplier-adder arrays can simultaneously process a plurality of data processing tasks, and the processing efficiency of the multiplier-adder arrays on the data processing tasks is improved. In addition, the multiplier-adder arrays are grouped based on the operand operation step length, so that the multiplier-adder which originally has invalid processing results of a certain data processing task is valid for the processing results of another data processing task, the utilization rate of the multiplier-adder arrays is improved, and the waste of computing resources is reduced.
In one possible embodiment, two adjacent same-group multiply-add devices in the same row of the multiply-add array are separated by a non-same-group multiply-add device with the same number and different zero, and two adjacent same-group multiply-add devices in the same column of the multiply-add array are separated by a non-same-group multiply-add device with the same number and different zero.
Therefore, each multiplier-adder group can be guaranteed to process different data processing tasks based on the grouping condition of the multiplier-adder array, so that the multiplier-adder array can simultaneously process a plurality of data processing tasks, and the processing efficiency of the multiplier-adder array on the data processing tasks is improved.
In one possible implementation, the grouping a plurality of multiplier-adders in a multiplier-adder array based on a matrix operand operation step size includes: determining a number of the multiplier-adder groups based on the matrix operand operation step size; grouping a plurality of multipliers in the multiplier-adder array based on the number of multiplier-adder groups.
Therefore, the processing result of each multiplier-adder group of the multiplier-adder array to the data task of the multiplier-adder group is guaranteed to be effective, so that the multiplier-adder array can simultaneously process a plurality of data processing tasks, and the processing efficiency of the multiplier-adder array to the data processing tasks is improved.
In one possible implementation, the grouping the plurality of multiplier-adders in the multiplier-adder array based on the number of multiplier-adder groups includes: determining a first target multiplier-adder in each multiplier-adder group from the multiplier-adder array; determining, from the multiplier-adder array, other target multiplier-adders in the each multiplier-adder group than the first target multiplier-adder based on a position of the first multiplier-adder in the multiplier-adder array, the matrix operand operation step size, and a size of the multiplier-adder array.
After the position of the first multiplier-adder of each multiplier-adder group in the multiplier-adder array is determined, the positions of other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array can be determined based on the position of the first multiplier-adder of each multiplier-adder group in the multiplier-adder array, and the grouping efficiency of the multiplier-adder array is improved.
In one possible implementation, the determining, from the multiplier-adder array, other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder based on the position of the first multiplier-adder in the multiplier-adder array, the matrix operand operation step size, and the size of the multiplier-adder array includes: for each multiplier-adder group, determining a first position relation between each multiplier-adder except the first multiplier-adder in each row of the multiplier-adder group and an adjacent previous multiplier-adder of the multiplier-adder in the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder array and the operation step size of the matrix operand; and determining a second position relation between each multiplier-adder except the first multiplier-adder in the column in the multiplier-adder group and an adjacent previous multiplier-adder in the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder group in the multiplier-adder array, the operation step size of the matrix operand and the number of columns in the multiplier-adder array; and determining the target positions of other target multipliers and adders except the first target multiplier and adder in the multiplier and adder group in the multiplier and adder array based on the first position relation and/or the second position relation.
In one possible embodiment, the determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array includes: determining a target matrix based on the operation step size of the matrix operand and the size of the multiplier-adder array; and determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array according to the matrix element values of the target matrix.
In one possible implementation, the executing, by each of the at least one multiplier-adder group, a data processing task corresponding to the each multiplier-adder group includes: storing the image data to be processed corresponding to each multiplier-adder group into a register array corresponding to each multiplier-adder group according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array; for each data processing cycle in a plurality of data processing cycles, respectively reading image data to be processed corresponding to each multiplier-accumulator group in the data processing cycle from the register array corresponding to each multiplier-accumulator group; processing the read image data to be processed, and obtaining the data processing result of each multiplier-adder group in the data processing period in parallel; and finishing the data processing tasks respectively corresponding to the multiplier-adder groups according to the data processing results respectively corresponding to the multiplier-adder groups in each data processing period.
Therefore, the multiplier-adder array reads corresponding operands in different data processing periods to ensure that each multiplier-adder group can process corresponding data processing tasks and ensure the validity of processing results of the multiplier-adder array on the data processing tasks.
In one possible implementation, storing the image data to be processed corresponding to each multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of the target multiplier-adder in the multiplier-adder array comprises: determining the number of registers contained in a register array corresponding to each multiplier-adder according to the size of the matrix operand; for each multiplier-adder group, determining the position of a target multiplier-adder of the multiplier-adder group in a register array which corresponds to the target multiplier-adder in each fixed reading mode; and for each multiplier-adder group, storing the image data to be processed corresponding to the multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the position of a register fixedly read by the target multiplier-adder in the multiplier-adder group and the processing sequence of operands contained in the image data to be processed in the data processing process, so that the operands stored in the positions of the registers fixedly read by each target multiplier-adder correspond to matrix elements in the matrix operands of the corresponding data processing period in each data processing period.
In a possible implementation manner, for each data processing cycle in the plurality of data processing cycles, the to-be-processed image data corresponding to each multiplier-adder group in the data processing cycle is read from the register array corresponding to each multiplier-adder group; and processing the read image data to be processed, and parallelly obtaining the data processing result of each multiplier-adder group in the data processing period, wherein the data processing result comprises the following steps: for the first data processing period of processing the image data to be processed, controlling each target multiplier-adder in each multiplier-adder group, and respectively reading an operand of each target multiplier-adder in the first data processing period from a register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as second operands; respectively determining the product of a first operand and a second operand of each target multiplier-adder in the first data processing period; for a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiplier-adder in each multiplier-adder group, and respectively reading the operand of each target multiplier-adder in the non-first data processing period from the register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the data processing period as a second operand; the product of the first operand and the second operand of each target multiplier-adder in the data processing period is respectively determined.
Therefore, the operands are enabled to make ordered displacement in the register array along with the transformation of the data processing period based on the preset step length and the preset data moving mode, the corresponding multiplier-adder in the multiplier-adder array can be ensured to obtain effective data, and the effectiveness of the processing result of the data processing task is ensured.
In a possible implementation manner, the completing the data processing tasks corresponding to the multiplier-adder groups according to the data processing results corresponding to the multiplier-adder groups in each data processing cycle includes: for each target multiplier-adder in each multiplier-adder group, adding products obtained by the target multiplier-adder in each data processing period to obtain a sum; and finishing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders respectively contained in each multiplier-adder group.
In one possible implementation, the data processing task includes: a convolution processing task; and the convolution processing tasks of different multiplier-adder groups correspond to different images to be processed.
Therefore, the multiplier-adder array can process a plurality of images to be processed simultaneously, and the processing efficiency of the multiplier-adder array on the images to be processed is improved.
In a second aspect, an embodiment of the present disclosure further provides a data processing apparatus, including: a controller; the controller is configured to:
grouping a plurality of multiplier-adders in a multiplier-adder array based on a matrix operand operation step size to obtain at least one multiplier-adder group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
In one possible embodiment, two adjacent same-group multiply-add devices in the same row of the multiply-add array are separated by a non-same-group multiply-add device with the same number and different zero, and two adjacent same-group multiply-add devices in the same column of the multiply-add array are separated by a non-same-group multiply-add device with the same number and different zero.
In one possible embodiment, when grouping a plurality of multiplier-adders in a multiplier-adder array based on a matrix operand operation step size, the controller is specifically configured to determine the number of multiplier-adder groups based on the matrix operand operation step size; grouping a plurality of multipliers in the multiplier-adder array based on the number of multiplier-adder groups.
In one possible embodiment, the controller is specifically configured to determine a first target multiplier-adder in each multiplier-adder group from the multiplier-adder array when grouping the plurality of multiplier-adders in the multiplier-adder array based on the number of multiplier-adder groups; determining, from the multiplier-adder array, other target multiplier-adders in the each multiplier-adder group than the first target multiplier-adder based on a position of the first multiplier-adder in the multiplier-adder array, the matrix operand operation step size, and a size of the multiplier-adder array.
In one possible embodiment, when determining the other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder array, the matrix operand operation step size and the size of the multiplier-adder array, the controller is specifically configured to determine, for each multiplier-adder group, a first positional relationship between each multiplier-adder in each row except the first multiplier-adder in the multiplier-adder group and an adjacent previous multiplier-adder of the multiplier-adder in the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder group and the matrix operand operation step size; and determining a second position relation between each multiplier-adder except the first multiplier-adder in the column in the multiplier-adder group and an adjacent previous multiplier-adder in the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder group in the multiplier-adder array, the operation step size of the matrix operand and the number of columns in the multiplier-adder array; and determining the target positions of other target multipliers and adders except the first target multiplier and adder in the multiplier and adder group in the multiplier and adder array based on the first position relation and/or the second position relation.
In a possible implementation, when determining a first target multiplier-adder in each multiplier-adder group from the multiplier-adder array, the controller is specifically configured to determine a target matrix based on the matrix operand operation step size, the size of the multiplier-adder array; and determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array according to the matrix element values of the target matrix.
In a possible implementation manner, when each multiplier-adder group in the at least one multiplier-adder group is utilized to execute the data processing task corresponding to each multiplier-adder group, the controller is specifically configured to store the image data to be processed corresponding to each multiplier-adder group into the register array corresponding to each multiplier-adder group according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array; for each data processing cycle in a plurality of data processing cycles, respectively reading image data to be processed corresponding to each multiplier-accumulator group in the data processing cycle from the register array corresponding to each multiplier-accumulator group; processing the read image data to be processed, and obtaining the data processing result of each multiplier-adder group in the data processing period in parallel; and finishing the data processing tasks respectively corresponding to the multiplier-adder groups according to the data processing results respectively corresponding to the multiplier-adder groups in each data processing period.
In a possible implementation manner, when storing the image data to be processed corresponding to each multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array, the controller is specifically configured to determine the number of registers included in each multiplier-adder corresponding register array according to the size of a matrix operand; for each multiplier-adder group, determining the position of a target multiplier-adder of the multiplier-adder group in a register array which corresponds to the target multiplier-adder in each fixed reading mode; and for each multiplier-adder group, storing the image data to be processed corresponding to the multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the position of a register fixedly read by the target multiplier-adder in the multiplier-adder group and the processing sequence of operands contained in the image data to be processed in the data processing process, so that the operands stored in the positions of the registers fixedly read by each target multiplier-adder correspond to matrix elements in the matrix operands of the corresponding data processing period in each data processing period.
In one possible implementation mode, for each data processing cycle in a plurality of data processing cycles, respectively reading the image data to be processed corresponding to each multiplier-adder group in the data processing cycle from the register array corresponding to each multiplier-adder group; when the read image data to be processed is processed and the data processing results of the multiplier-adder groups in the data processing period are obtained in parallel, the controller is specifically used for controlling each target multiplier-adder in each multiplier-adder group aiming at the first data processing period for processing the image data to be processed and respectively reading the operand of each target multiplier-adder in the first data processing period from the register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as second operands; respectively determining the product of a first operand and a second operand of each target multiplier-adder in the first data processing period; for a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiplier-adder in each multiplier-adder group, and respectively reading the operand of each target multiplier-adder in the non-first data processing period from the register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the data processing period as a second operand; the product of the first operand and the second operand of each target multiplier-adder in the data processing period is respectively determined.
In a possible embodiment, when the data processing tasks corresponding to the multiplier-adder groups are completed according to the data processing results corresponding to the multiplier-adder groups in each data processing cycle, the controller is specifically configured to add, for each target multiplier-adder in each multiplier-adder group, the products obtained by the target multiplier-adder in each data processing cycle to obtain a sum; and finishing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders respectively contained in each multiplier-adder group.
In one possible implementation, the data processing task includes: a convolution processing task; and the convolution processing tasks of different multiplier-adder groups correspond to different images to be processed.
In a third aspect, this disclosure also provides a computer device, a controller, and a memory, where the memory stores machine-readable instructions executable by the controller, and the controller is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the controller, the machine-readable instructions are executed by the controller to perform the steps in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.
For the description of the effects of the data processing apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the data processing method, which is not repeated herein.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a flow chart of a data processing method provided by an embodiment of the present disclosure;
fig. 2 illustrates an example diagram of a multiplier-adder array provided by an embodiment of this disclosure;
FIG. 3 illustrates an example graph of a matrix operand based operation step move provided by an embodiment of the present disclosure;
FIG. 4 illustrates an example diagram of a multiplier-adder array divided into four multiplier-adder groups provided by this disclosure;
fig. 5 is a diagram illustrating an example of a matrix for determining the position of a leading target multiplier-adder in each multiplier-adder group in a multiplier-adder array according to an embodiment of the disclosure;
fig. 6 illustrates an example diagram of a multiplier-adder array and a corresponding register array provided by an embodiment of the disclosure;
FIG. 7 is a diagram illustrating an example of a register array a after shifting image data to be processed in the register array by one step to the left in an overall manner in an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure;
fig. 9 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It has been found that convolutional neural networks rely primarily on multiplier-adder arrays for convolution processing. When convolution processing is carried out, the image data to be processed is stored in a register array connected with the multiplier-adder array; the image data to be processed stored in the register array can move in the register array in different data processing periods; the multiplier-adder array reads the operands of the data processing cycle from the registers (belonging to the register array) connected to the multiplier-adder array and performs multiplication and/or addition operations per data processing cycle. After the processing of a plurality of data processing periods, the multiplier-adder array outputs a partial result of the convolution processing of the image data to be processed. When the operation step size of the matrix operand is larger than 1, the processing result of part of the multipliers in the multiplier-adder array is not needed in the result of processing the image data to be processed, so that the data processing mode in the case has the problems of low utilization rate of the multiplier-adder array and waste of computing resources.
Based on the above research, the present disclosure provides a data processing method, an apparatus, a computer device, and a storage medium, where at least one multiplier-adder group is obtained by grouping multiplier-adder arrays based on a matrix operand operation step length, and different multiplier-adder groups in the multiplier-adder arrays respectively process data processing tasks corresponding to different image data to be processed in parallel, that is, the same multiplier-adder array can process multiple image data to be processed simultaneously, and each multiplier-adder group processes one image data to be processed, so that multipliers-adder unused in a process of processing one image data to be processed are used to process other image data to be processed, thereby improving utilization rate of the multiplier-adder array, reducing waste of computing resources, and improving processing efficiency of the multiplier-adder array on the image data to be processed.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, first, a data processing method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the data processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the data processing method may be implemented by a processor calling computer readable instructions stored in a memory.
The following describes a data processing method provided by the embodiments of the present disclosure.
Referring to fig. 1, a flowchart of a data processing method provided by the embodiment of the present disclosure is shown, where the method includes steps S101 to S102, where:
s101: grouping a plurality of multiplier-adders in a multiplier-adder array to obtain at least one multiplier-adder group based on the operation step length of a matrix operand;
s102: with each of the at least one multiplier-accumulator set, data processing tasks corresponding to each multiplier-accumulator set are performed in parallel.
The method comprises the steps of grouping multiplier-adder arrays based on operand operation step lengths to obtain at least one multiplier-adder group, and enabling each multiplier-adder group in the at least one multiplier-adder group to execute data processing tasks corresponding to the multiplier-adder group in parallel; the data processing tasks processed by each multiplier-adder group are different, so that the multiplier-adder array can simultaneously process a plurality of data processing tasks, and the processing efficiency of the multiplier-adder array on the data processing tasks is improved.
In addition, in the processing mode, the multiplier-adder which is not used in the processing process of one image data to be processed is used for processing other image data to be processed, so that the utilization rate of the multiplier-adder array is improved, and the waste of computing resources is reduced.
The following describes the details of S101 to S102.
For the above S101, the multiplier-adder array is a matrix array composed of at least one multiplier-adderAs an example, fig. 2 shows an exemplary diagram of a multiplier-adder array provided by the present disclosure, which includes 4 rows and 4 columns for 16 multiplier-adders. The matrix operand includes, for example, a convolution kernel when processing image data to be processed; the convolution operand step is, for example, the convolution move step. Illustratively, a convolution kernel moving by 2 steps as in FIG. 3 represents: sx=2、SyFor example, the moving process is from the first target position shown in a to the second target position shown in b, and then from the second target position shown in b to the third target position shown in c, that is, two pixels at a time when the moving is performed in the transverse direction, and two pixels at a time when the moving is performed in the longitudinal direction; wherein S isxRepresenting pixels moving in the lateral direction, SyRepresenting pixels moving in the longitudinal direction.
In grouping the multiplier-adders in the multiplier-adder array, for example, the number of multiplier-adder groups may be determined based on the matrix operand operation step size, and the plurality of multiplier-adders in the multiplier-adder array may be grouped based on the number of multiplier-adder groups.
In a specific implementation, the matrix operand operation step size multiplier-adder set number is related to: number of multiplier-adder groups being Sx*Sy(ii) a For example, when the operation step size of the matrix operand is 2, Sx=2、SyWhen 2, the number of the multiplier-adder groups is Sx*Sy=2*2=4。
In a specific implementation, an embodiment of the present disclosure provides a specific method for grouping a plurality of multiply adders in a multiply-add array based on a matrix operand operation step size to obtain at least one multiply-add group, including:
determining a first target multiplier-adder in each multiplier-adder group from the multiplier-adder array;
the other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder are determined from the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder array, the matrix operand operation step size, and the size of the multiplier-adder array.
In some cases, the graph to be processed is fixed due to the fixed size of the multiplier-adder arrayThe size of the image data may be different according to the actual image processing situation, and therefore, even if the data processing method provided by the embodiment of the present disclosure is used to process a plurality of image data to be processed in parallel, the utilization rate of the multiplier-adder array may not reach one hundred percent in many cases. Therefore, in the embodiment of the present disclosure, size information of the multiplier-adder array actually used is first determined based on the operation step size of the matrix operand and size information of the multiplier-adder array; the size information of the multiplier-adder array comprises the row number and the column number of the multiplier-adder array, and the size information of the actually used multiplier-adder array comprises the row number and the column number of the actually used multiplier-adder array; the relationship between the size information of the multiplier-adder array actually used and the operation step size of the matrix operand, and the size information of the multiplier-adder array is: a'x=Ax-Ax%Sx;A′y=Ay-Ay%Sy(ii) a Wherein A isxFor the number of columns of the multiplier-adder array, AyIs the number of rows in the multiplier-adder array; a'xIs column number, A 'of multiplier-adder array actually used'yFor the number of rows of the multiplier-adder array actually used,% is the operation of calculating the remainder. Illustratively, S when the operation step size of the matrix operand is 2x=2、SyThe size information of the multiplier-adder array is 2: a. thex=5,Ay(ii) 5; column number A 'of multiplier-adder array thus actually used'x=Ax-Ax%Sx5-5% 2-4, the number of rows of the multiplier-adder array actually used is a'y=Ay-Ay%Sy=5-5%2=4。
The first target multiplier-adder for each multiplier-adder group is then determined in the multiplier-adder array actually used.
In particular implementation, the first target range multiplier-adder of each multiplier-adder group may be determined, for example, based on:
determining a target matrix based on the operation step length of the matrix operand and the size of the multiplier-adder array; determining the position of a first target multiplier-adder in each multiplier-adder group in a multiplier-adder array according to the matrix element value of the target matrix; wherein the matrix element values are the multiply-add units in the target matrix.
Illustratively, the size information of the multiplier-adder array is 4 rows and 4 columns, and S is S when the operation step size of the matrix operand is 2x=2、SyWhen 2, the number of the multiplier-adder groups is Sx*Sy2 x 4, the target matrix includes two rows and two columns for 4 multiply adders, the first multiply adder in the multiply adder array is used as the first multiply adder of the target matrix, that is, the first target multiply adder of the first multiply adder group, the first target multiply adder of other multiply adder groups in the target matrix is determined based on the first multiply adder of the target matrix, for example, the position arrangement number of the actually used multiply adder array is as
Figure BDA0002925913150000101
The first target multiplier-adder of the first multiplier-adder group, that is, the first multiplier-adder of the target matrix is at position 0, and the corresponding position arrangement number of the target matrix in two rows and two columns determined based on the first target multiplier-adder of the first multiplier-adder group at position 0 in the multiplier-adder array in actual use is as
Figure BDA0002925913150000102
The first target multiplier-adder of the other three multiplier-adder groups is numbered 1, 4, 5 in the actual used multiplier-adder array, respectively, as the target matrix shown in fig. 4.
For example, the target position of the first target multiplier-adder of each multiplier-adder group in the target image can be determined by referring to the formula corresponding to each position of the matrix shown in fig. 5, that is, each matrix element in the matrix shown in fig. 5 represents the position of the first target multiplier-adder of one multiplier-adder array group. Wherein, A'xIs column number, A 'of multiplier-adder array actually used'yIs the number of rows, A ', of the actual multiplier-adder array'x=Ax-Ax%Sx;A′y=Ay-Ayy,AxFor the number of columns of the multiplier-adder array, AyIs the number of rows in the multiplier-adder array; sxStep size of lateral shift for matrix operand operation step size, SyA vertical shift step size of the matrix operand operation step size.
After the first target multiplier-adder of each multiplier-adder group is determined in the multiplier-adder array in actual use, the other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder may be determined, for example, based on the method described in the following steps one to three:
the method comprises the following steps: for each multiplier-adder group, determining a first position relation between each multiplier-adder except the first multiplier-adder in each row in the multiplier-adder group and an adjacent previous multiplier-adder in the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder array and the operation step length of a matrix operand;
wherein, the first position relationship between each multiplier-adder of each row except the first multiplier-adder of the row and the adjacent previous multiplier-adder of the multiplier-adder in the multiplier-adder array is as follows: position + S of the adjacent previous multiplier-adderxThe position of the multiplier-adder except the first multiplier-adder in each row.
Illustratively, the actual multiplier-adder array used is 4 rows and 4 columns, A'y=4、A′xThe position arrangement of the multiplier-adder array actually used is numbered as 4
Figure BDA0002925913150000111
When the operation step length of the matrix operand is 2, Sx=2、Sy2, the four different colors as shown in fig. 4 represent four multiplier-adder groups: a first multiplier-accumulator group in black, a second multiplier-accumulator group in white, a third multiplier-accumulator group in light gray, and a fourth multiplier-accumulator group in dark gray, taking the first multiplier-accumulator group as an example, if the first multiplier-accumulator of the first multiplier-accumulator group is at position 0, then the position of another multiplier-accumulator a in the same group in the row is: 0+ SxThe position of the next multiplier-adder B after the row-id position 2 is 0+ 2: 2+ Sx2+ 2-4, but since the size of the multiplier-adder array actually used is 4 columns, and the maximum position arrangement number of the row is 3, the maximum position arrangement number of the row is 4The multiplier-adders in the same group at row and position 0 have only the multiplier-adder at position 2.
Step two: determining a second position relation of each multiplier-adder except the first multiplier-adder in each column in the multiplier-adder group and an adjacent previous multiplier-adder in the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder array, the operation step size of a matrix operand and the number of columns in the multiplier-adder array;
wherein, the second position relationship between each multiplier-adder in each row of the multiplier-adder group except the first multiplier-adder in the row and the adjacent previous multiplier-adder of the multiplier-adder in the multiplier-adder array is as follows: position + S of adjacent previous multiplier-adder in multiplier-adder arrayy*A′xThe position of the multiplier-adder except the first multiplier-adder in each column.
Illustratively, the actual multiplier-adder array used is 4 rows and 4 columns, A'y=4、A′xThe position arrangement of the multiplier-adder array actually used is numbered as 4
Figure BDA0002925913150000112
When the operation step length of the matrix operand is 2, Sx=2、SyAs shown in fig. 4, taking the first multiplier-adder group as an example, if the first multiplier-adder of the first multiplier-adder group is at position 0, the position of the other multiplier-adder C in the same column of the group is: 0+ Sy*A′xThe position of the next multiplier-adder D after the column position 8 is 0+2 × 4 — 8: 8+ Sy*A′x8+2 × 4 is 16, but since the size of the multiplier-adder array actually used is 4 rows, and the maximum position arrangement of the column is numbered 12, the multiplier-adder in the same group as that at position 0 has only the multiplier-adder at position 8.
Step three: and determining the target positions of other target multipliers and adders except the first target multiplier and adder in the multiplier and adder group in the multiplier and adder array based on the first position relation and/or the second position relation.
Illustratively, the actual multiplier-adder array used is 4 rows and 4 columns, A'y=4、A′xMultiplier-adder array for practical use as 4Is numbered in the position arrangement of
Figure BDA0002925913150000121
When the operation step length of the matrix operand is 2, Sx=2、SyAfter calculating the position of the first multiplier-adder of each multiplier-adder group in each row or column, referring to the above formula for calculating the position of the adjacent multiplier-adder in the same row and the same group or calculating the target position of the other target multiplier-adders except the first target multiplier-adder in the multiplier-adder group in the multiplier-adder array, as shown in fig. 4, taking the first multiplier-adder as an example, the first multiplier-adder of the first multiplier-adder group is at position 0, the next multiplier-adder a in the same row and the same group is at position 2, and the next multiplier-adder E in the same column as the multiplier-adder a is at position: 2+ Sy*A′x2+2 × 4-2 + 8-10; or, for example, if the position of the next multiplier-adder C in the same column as the multiplier-adder C at position 0 is 8, the position of the next multiplier-adder E in the same row as the multiplier-adder C is: 8+ Sx=8+2=10。
Illustratively, as shown in fig. 4, the present disclosure provides an exemplary diagram of a multiplier-adder array divided into four multiplier-adder groups, four different colors representing the four multiplier-adder groups, a first multiplier-adder group in black, a second multiplier-adder group in white, a third multiplier-adder group in light gray, and a fourth multiplier-adder group in dark gray; in the same row of the multiplier-adder array, the number of the adjacent two same-group multiplier-adder spacing non-same-group multiplier-adders is the same and is not zero, and in the same column of the multiplier-adder array, the number of the adjacent two same-group multiplier-adder spacing non-same-group multiplier-adders is the same and is not zero.
For the above S102, the to-be-processed images corresponding to the convolution processing tasks of different multiplier-adder groups are different, for example, each multiplier-adder group performs convolution on a different data matrix.
When a data processing task corresponding to each multiplier-adder group is executed in parallel by using each multiplier-adder group in at least one multiplier-adder group, the image data to be processed corresponding to each multiplier-adder group is stored into the register array corresponding to each multiplier-adder group according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array.
Here, the image data to be processed includes, for example, at least one of:
an original image to be processed;
a subgraph corresponding to any color channel in an original image to be processed;
carrying out feature extraction on the original image to obtain a feature map;
performing feature extraction on the original image to obtain a feature subgraph corresponding to at least one channel in a feature graph;
performing data filling processing on a subgraph corresponding to at least one color channel in the original image to obtain;
and performing data filling processing on the characteristic subgraph corresponding to at least one channel of the characteristic graph.
Taking the feature map as the image data to be processed as an example, when the image data to be processed is stored in the register array, the feature value of a feature point in the image data to be processed, which is also called an operand required by the multiplier-adder, is stored in each register of at least a part of the registers.
For each multiplier-adder group, determining the position of a target multiplier-adder of the multiplier-adder group in a register array which corresponds to the target multiplier-adder in each fixed reading mode; as shown in fig. 6, the multiplier-adder array in n includes four multiplier-adder groups, four multiplier-adder groups in n correspond to four register arrays shown in m, a black multiplier-adder group corresponds to a black register array a, a white multiplier-adder group corresponds to a white register array B, a light gray multiplier-adder group corresponds to a light gray register array C, a dark gray multiplier-adder group corresponds to a dark gray register array d, a target multiplier-adder PE0 reads the characteristic value stored in a0 from a register a0 fixedly read in each corresponding register array, a target multiplier-adder PE1 reads the characteristic value stored in B0 in a register B0, a target multiplier-adder PE2 reads the characteristic value stored in a2 in a register a2, a target multiplier-adder PE3 reads the characteristic value stored in B2 in register B2, a target multiplier-adder PE4 reads the characteristic value stored in C0 in a register C0, target multiplier PE5 reads the feature value stored in D5 in register D5, target multiplier PE5 reads the feature value stored in C5 in register C5, target multiplier PE5 reads the feature value stored in D5 in register D5, target multiplier PE5 reads the feature value stored in a5 in register a5, PE5 reads the feature value stored in B5 in register B5, target multiplier PE5 reads the feature value stored in a5 in register a5, target multiplier PE5 reads the feature value stored in B5 in register B5, target multiplier PE5 reads the feature value stored in C5 in register C5, target multiplier PE5 reads the feature value stored in register D5, and target multiplier PE5 reads the feature value stored in register D5.
And for each multiplier-adder group, storing the image data to be processed corresponding to the multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the position of a register fixedly read by the target multiplier-adder in the multiplier-adder group and the processing sequence of operands contained in the image data to be processed in the data processing process, so that the operands stored in the positions of the registers fixedly read by each target multiplier-adder correspond to matrix elements in the matrix operands of the corresponding processing period in each data processing period.
Where the matrix operand includes, for example, a convolution kernel, i.e., a data matrix, in a convolution calculation, illustratively,
Figure BDA0002925913150000131
is a two-row and two-column matrix operand, comprising matrix elements: w0、W1、W2、W3. The number of operands contained in the image data to be processed corresponding to each multiplier-adder group is consistent. The image data to be processed corresponding to the first multiplier-adder group shown in FIG. 6 is
Figure BDA0002925913150000141
The image data to be processedThe storage rule in the black register array corresponding to the first multiplier-adder group is shown as a in FIG. 6, and the image data to be processed corresponding to the second multiplier-adder group is
Figure BDA0002925913150000142
The storage rule of the image data to be processed in the white register array corresponding to the second multiplier-adder group is shown as b in FIG. 6, and the image data to be processed corresponding to the third multiplier-adder group is
Figure BDA0002925913150000143
The storage rule of the image data to be processed in the light gray register array corresponding to the third multiplier-adder group is shown as c in FIG. 6, and the image data to be processed corresponding to the fourth multiplier-adder group is
Figure BDA0002925913150000144
The storage rule of the image data to be processed in the dark gray register array corresponding to the fourth multiplier-adder group is shown as d in fig. 6.
After storing the image data to be processed corresponding to each multiplier-adder group into the register array corresponding to each multiplier-adder group, respectively reading the image data to be processed corresponding to each multiplier-adder group in the data processing period from the fixed register array corresponding to each multiplier-adder group aiming at each data processing period in a plurality of data processing periods; and processing the read image data to be processed, and obtaining the data processing result of each multiplier-adder group in the data processing period in parallel.
The method comprises the steps that for a first data processing period for processing image data to be processed, each target multiplier-adder in each multiplier-adder group is controlled, and operands of each target multiplier-adder in the first data processing period are read from a register fixedly read by each target multiplier-adder to serve as first operands; determining matrix elements of matrix operands corresponding to each multiplier-adder group in the first data processing period as second operands; respectively determining the product of a first operand and a second operand of each target multiplier-adder in the first data processing period;
for example, the target multiplier-adder PE0 reads the operand a0 from the register a0 that is fixedly read from the corresponding register array, and the target multiplier-adder PE1 reads the operand B0 from the register B0, and so on, and the operands read by the target multiplier-adder are not described herein; assume that the matrix operands are:
Figure BDA0002925913150000145
taking PE0 as an example, after reading operand a0, a0 is taken as the first operand, and the matrix element corresponding to the data processing cycle is W0W is to be0As a second operand, then calculate W0A 0; and stores the result in a register.
For a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiplier-adder in each multiplier-adder group, and respectively reading the operand of each target multiplier-adder in the non-first data processing period from the register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the data processing period as a second operand; the product of the first operand and the second operand of each target multiplier-adder in the data processing period is respectively determined.
For example, taking the second data processing period corresponding to the multiplier-adder group shown in fig. 6 as an example, the left shift step size is 1, as shown in fig. 7, which is an example diagram of the register array a after the image data to be processed is shifted left by one step size in the register array as a whole in the embodiment of the present disclosure, PE0 reads a1 from a0, PE2 reads A3 … … from a2, and other multiplier-adders read operands and so on, and are not described again; the matrix operands are:
Figure BDA0002925913150000151
taking PE0 as an example, after reading operand a1, a1 is taken as the first operand, and the matrix element corresponding to the data processing cycle is W1W is to be1As a firstTwo operands, then calculate W1A 1; and stores the result in a register carried by itself.
Similarly, in the third data processing cycle, the data to be processed may be shifted up by one step as a whole based on the position shown in fig. 7, at this time, a5 is stored in a0, and PE0 may perform the calculation of W2 × a 5; in the fourth data processing cycle, the data to be processed may be shifted to the right by one step as a whole on the basis of the movement completed in the third data processing cycle, at this time, a4 is stored in a0, and PE0 may perform calculation of W3 × a4, and other PEs are the same, and are not described here again.
It can be seen that in each data processing cycle, the PE storing different image data to be processed completes the calculation of the corresponding data processing cycle, that is, the different multiplier-accumulator sets complete the calculation in the corresponding data processing cycle in parallel in each data processing cycle, and after all the data processing cycles, the different multiplier-accumulator sets complete the final calculation at the same time, thereby saving system resources.
Here, for different image data to be processed, the corresponding convolution kernels may be different or the same. For example, if two pieces of image data to be processed are respectively different feature subgraphs of the same feature graph, convolution kernels corresponding to the two pieces of image data to be processed are different. And if the two pieces of image data to be processed are image data at different positions of the same characteristic subgraph, the convolution kernels corresponding to the two pieces of images to be processed are the same.
And finishing the data processing tasks respectively corresponding to the multiplier-adder groups according to the data processing results respectively corresponding to the multiplier-adder groups in each data processing period.
For each target multiplier-adder in each multiplier-adder group, adding products obtained by the target multiplier-adder in each data processing period to obtain a sum; and finishing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders respectively contained in each multiplier-adder group.
For example, taking PE0 shown in fig. 6 as an example, the calculation performed in four data processing cycles of PE0 is: w0*a0、W1*a1、W2*a5、W3A 4; the four calculation results are added: w0*a0+W1*a1+W2*a5+W3A4, the result is a result value in the processing result matrix of the data processing task of the image data to be processed corresponding to the first multiplier-adder group, and the result value in the processing result matrix of the data processing task of the image data to be processed corresponding to the first multiplier-adder group is arranged as
Figure BDA0002925913150000152
Here, if the image data to be processed after convolution is a feature map, the feature map includes 16 channels, and feature subgraphs corresponding to 4 channels are processed each time, that is, the feature subgraphs corresponding to 16 channels are divided into 4 groups, and a group of feature subgraphs are processed each time. If the 4 groups of characteristic subgraphs are respectively: when the group a, the group b, the group c and the group d are used, after 4 characteristic subgraphs included in the group a are processed, 4 results corresponding to the group a output by the multiplier-adder are accumulated; after the 4 characteristic subgraphs included in the group b are processed, accumulating 4 results corresponding to the group b, and accumulating the accumulated result corresponding to the group a and the accumulated result corresponding to the group b; after the 4 characteristic subgraphs included in the group c are processed, accumulating 4 results corresponding to the group c, and accumulating the accumulated results of the group a and the group b and the accumulated result corresponding to the group c; after the 4 characteristic subgraphs included in the group d are processed, the 4 results corresponding to the group d are accumulated, the accumulated results of the group a, the group b and the group c and the accumulated result corresponding to the group d are accumulated, and finally, the accumulated sum of the convolution results corresponding to the 16 channels is obtained.
After the 4 feature subgraphs included in the group a are processed, the obtained 4 output results corresponding to the group a are respectively: a1, a2, a3 and a 4. After the 4 feature subgraphs included in the group b are processed, the obtained 4 output results corresponding to the group b are respectively: b1, b2, b3 and b 4. At this time, a1+ b1 ═ O1, a2+ b2 ═ O2, a3+ b3 ═ O3, and a4+ b4 ═ O4 are performed. After the 4 feature subgraphs included in the group c are processed, the obtained 4 output results corresponding to the group c are respectively: c1, c2, c3 and c4, and further performing: o1+ c1, O2+ c2, O3+ c3, O4+ c 4; by analogy, a1+ b1+ c1+ d1, a2+ b2+ c2+ d2, a3+ b3+ c3+ d3 and a4+ b4+ c4+ d4 are finally obtained, and then the four results are accumulated together to obtain the accumulated sum of convolution results corresponding to 16 channels respectively.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, a data processing apparatus corresponding to the data processing method is also provided in the embodiments of the present disclosure, and because the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the data processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
Referring to fig. 8, a schematic diagram of a data processing apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: a controller 801; the controller 801 is configured to:
grouping a plurality of multiplier-adders in the multiplier-adder array based on a matrix operand operation step size to obtain at least one multiplier-adder group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
In one possible embodiment, two adjacent same-group multiply-add devices in the same row of the multiply-add array are separated by a non-same-group multiply-add device with the same number and different zero, and two adjacent same-group multiply-add devices in the same column of the multiply-add array are separated by a non-same-group multiply-add device with the same number and different zero.
In one possible embodiment, when grouping the plurality of multiplier-adders in the multiplier-adder array based on the matrix operand operation step size, the controller 801 is specifically configured to determine the number of multiplier-adder groups based on the matrix operand operation step size; grouping a plurality of multipliers in the multiplier-adder array based on the number of multiplier-adder groups.
In one possible implementation, when grouping the plurality of multiplier-adders in the multiplier-adder array based on the number of multiplier-adder groups, the controller 801 is specifically configured to determine a first target multiplier-adder in each multiplier-adder group from the multiplier-adder array; determining, from the multiplier-adder array, other target multiplier-adders in the each multiplier-adder group than the first target multiplier-adder based on a position of the first multiplier-adder in the multiplier-adder array, the matrix operand operation step size, and a size of the multiplier-adder array.
In one possible embodiment, when determining other target multipliers and adders in each multiplier and adder group except the first target multiplier and adder in each multiplier and adder group from the multiplier and adder array based on the position of the first multiplier and adder in the multiplier and adder array, the controller 801 is specifically configured to determine, for each multiplier and adder group, a first position relationship between each multiplier and adder in each multiplier and adder array adjacent to the first multiplier and adder in each row except the first multiplier and adder in the multiplier and adder group based on the position of the first multiplier and adder in the multiplier and adder array and the matrix operand operation step; and determining a second position relation between each multiplier-adder except the first multiplier-adder in the column in the multiplier-adder group and an adjacent previous multiplier-adder in the multiplier-adder array based on the position of the first multiplier-adder in the multiplier-adder group in the multiplier-adder array, the operation step size of the matrix operand and the number of columns in the multiplier-adder array; and determining the target positions of other target multipliers and adders except the first target multiplier and adder in the multiplier and adder group in the multiplier and adder array based on the first position relation and/or the second position relation.
In one possible implementation, when determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array, the controller 801 is specifically configured to determine a target matrix based on the matrix operand operation step size and the size of the multiplier-adder array; and determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array according to the matrix element values of the target matrix.
In a possible implementation manner, when each multiplier-adder group of the at least one multiplier-adder group is utilized to execute the data processing task corresponding to each multiplier-adder group, the controller 801 is specifically configured to store the image data to be processed corresponding to each multiplier-adder group into the register array corresponding to each multiplier-adder group according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array; for each data processing cycle in a plurality of data processing cycles, respectively reading image data to be processed corresponding to each multiplier-accumulator group in the data processing cycle from the register array corresponding to each multiplier-accumulator group; processing the read image data to be processed, and obtaining the data processing result of each multiplier-adder group in the data processing period in parallel; and finishing the data processing tasks respectively corresponding to the multiplier-adder groups according to the data processing results respectively corresponding to the multiplier-adder groups in each data processing period.
In a possible implementation manner, when storing the image data to be processed corresponding to the multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the controller 801 is specifically configured to determine the number of registers included in the register array corresponding to each multiplier-adder group according to the size of a matrix operand; for each multiplier-adder group, determining the position of a target multiplier-adder of the multiplier-adder group in a register array which corresponds to the target multiplier-adder in each fixed reading mode; and for each multiplier-adder group, storing the image data to be processed corresponding to the multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the position of a register fixedly read by the target multiplier-adder in the multiplier-adder group and the processing sequence of operands contained in the image data to be processed in the data processing process, so that the operands stored in the positions of the registers fixedly read by each target multiplier-adder correspond to matrix elements in the matrix operands in the corresponding processing period in each data processing period.
In one possible implementation mode, for each data processing cycle in a plurality of data processing cycles, respectively reading the image data to be processed corresponding to each multiplier-adder group in the data processing cycle from the register array corresponding to each multiplier-adder group; when the read image data to be processed is processed and the data processing results of each multiplier-adder group in the data processing cycle are obtained in parallel, the controller 801 is specifically configured to control each target multiplier-adder in each multiplier-adder group for a first data processing cycle in which the image data to be processed is processed, and read the operand of each target multiplier-adder in the first data processing cycle from the register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as second operands; respectively determining the product of a first operand and a second operand of each target multiplier-adder in the first data processing period; for a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiplier-adder in each multiplier-adder group, and respectively reading the operand of each target multiplier-adder in the non-first data processing period from the register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the data processing period as a second operand; the product of the first operand and the second operand of each target multiplier-adder in the data processing period is respectively determined.
In a possible embodiment, when the data processing tasks corresponding to the multiplier-adder groups are completed according to the data processing results corresponding to the multiplier-adder groups in each data processing cycle, the controller 801 is specifically configured to add, for each target multiplier-adder in each multiplier-adder group, the products obtained by the target multiplier-adder in each data processing cycle to obtain a sum; and finishing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders respectively contained in each multiplier-adder group.
In one possible implementation, the data processing task includes: a convolution processing task; and the convolution processing tasks of different multiplier-adder groups correspond to different images to be processed.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
The image processing device provided by the embodiment of the disclosure may include a chip, an AI chip, and the like.
An embodiment of the present disclosure further provides a computer device, as shown in fig. 9, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and the computer device includes:
a controller 91 and a memory 92; the memory 92 stores machine-readable instructions executable by the controller 91, the controller 91 is configured to execute the machine-readable instructions stored in the memory 92, when the machine-readable instructions are executed by the controller 91, the controller 91 performs the following steps:
grouping a plurality of multiplier-adders in the multiplier-adder array based on a matrix operand operation step size to obtain at least one multiplier-adder group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
The memory 92 includes a memory 921 and an external memory 922; the memory 921 is also referred to as an internal memory, and temporarily stores operation data in the controller 91 and data exchanged with an external memory 922 such as a hard disk, and the controller 91 exchanges data with the external memory 922 through the memory 921.
The computer device provided by the embodiment of the present disclosure may include an intelligent terminal such as a mobile phone, or may also be other devices, servers, and the like that have a camera and can perform image processing, and is not limited herein.
For the specific execution process of the instruction, reference may be made to the steps of the data processing method described in the embodiments of the present disclosure, and details are not described here.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the data processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A data processing method, comprising:
grouping a plurality of multiplier-adders in a multiplier-adder array based on a matrix operand operation step size to obtain at least one multiplier-adder group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
2. The data processing method of claim 1, wherein two adjacent same-group multiplier-adder-spaced non-same-group multiplier-adders in a same row of the multiplier-adder array are equal in number and different from zero, and two adjacent same-group multiplier-adder-spaced non-same-group multiplier-adders in a same column of the multiplier-adder array are equal in number and different from zero.
3. The data processing method of claim 1 or 2, wherein grouping a plurality of multipliers in a multiplier-adder array based on a matrix operand operation step size comprises:
determining a number of the multiplier-adder groups based on the matrix operand operation step size;
grouping a plurality of multipliers in the multiplier-adder array based on the number of multiplier-adder groups.
4. The data processing method of claim 3, wherein the grouping a plurality of multipliers in the multiplier-adder array based on the number of multiplier-adder groups comprises:
determining a first target multiplier-adder in each multiplier-adder group from the multiplier-adder array;
determining, from the multiplier-adder array, other target multiplier-adders in the each multiplier-adder group than the first target multiplier-adder based on a position of the first target multiplier-adder in the multiplier-adder array, the matrix operand operation step size, and a size of the multiplier-adder array.
5. The data processing method of claim 4, wherein determining the other target multiply-adders in the each multiply-adder group except the first target multiply-adder from the multiply-adder array based on the position of the first target multiply-adder in the multiply-adder array, the matrix operand operation step size, and the size of the multiply-adder array comprises:
for each multiplier-adder group, determining a first position relation between each multiplier-adder except the first target multiplier-adder in each row of the multiplier-adder group and an adjacent previous multiplier-adder of the multiplier-adder in the multiplier-adder array based on the position of the first target multiplier-adder in the multiplier-adder array and the operation step length of the matrix operand; and are
Determining a second position relation of each multiplier-adder except the first row of multiplier-adders in each row of the multiplier-adder group and an adjacent previous multiplier-adder of the multiplier-adder in the multiplier-adder array based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array, the operation step size of the matrix operand and the number of rows of the multiplier-adder array;
and determining the target positions of other target multipliers and adders except the first target multiplier and adder in the multiplier and adder group in the multiplier and adder array based on the first position relation and/or the second position relation.
6. The data processing method according to claim 4 or 5, wherein said determining a first target multiplier-adder in said each multiplier-adder group from said multiplier-adder array comprises:
determining a target matrix based on the operation step size of the matrix operand and the size of the multiplier-adder array;
and determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array according to the matrix element values of the target matrix.
7. The data processing method according to any of claims 1-6, wherein said performing, with each of said at least one multiplier-accumulator set, a data processing task corresponding to said each multiplier-accumulator set comprises:
storing the image data to be processed corresponding to each multiplier-adder group into a register array corresponding to each multiplier-adder group according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array;
for each data processing cycle in a plurality of data processing cycles, respectively reading image data to be processed corresponding to each multiplier-accumulator group in the data processing cycle from the register array corresponding to each multiplier-accumulator group; and are
Processing the read image data to be processed, and parallelly obtaining the data processing result of each multiplier-adder group in the data processing period;
and finishing the data processing tasks respectively corresponding to the multiplier-adder groups according to the data processing results respectively corresponding to the multiplier-adder groups in each data processing period.
8. The data processing method according to claim 7, wherein storing the image data to be processed corresponding to each multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of the respective target multiplier-adder in the multiplier-adder array comprises:
determining the number of registers contained in a register array corresponding to each multiplier-adder according to the size of the matrix operand;
for each multiplier-adder group, determining the position of a target multiplier-adder of the multiplier-adder group in a register array which corresponds to the target multiplier-adder in each fixed reading mode;
and for each multiplier-adder group, storing the image data to be processed corresponding to the multiplier-adder group into the register array corresponding to the multiplier-adder group according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the position of a register fixedly read by the target multiplier-adder in the multiplier-adder group and the processing sequence of operands contained in the image data to be processed in the data processing process, so that the operands stored in the positions of the registers fixedly read by each target multiplier-adder correspond to matrix elements in the matrix operands of the corresponding data processing period in each data processing period.
9. The data processing method according to claim 7 or 8, wherein for each data processing cycle of the plurality of data processing cycles, the image data to be processed corresponding to each multiplier-adder group of the data processing cycle is read from the register array corresponding to each multiplier-adder group; and processing the read image data to be processed, and parallelly obtaining the data processing result of each multiplier-adder group in the data processing period, wherein the data processing result comprises the following steps:
for the first data processing period of processing the image data to be processed, controlling each target multiplier-adder in each multiplier-adder group, and respectively reading an operand of each target multiplier-adder in the first data processing period from a register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as second operands; respectively determining the product of a first operand and a second operand of each target multiplier-adder in the first data processing period;
for a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiplier-adder in each multiplier-adder group, and respectively reading the operand of each target multiplier-adder in the non-first data processing period from the register fixedly read by each target multiplier-adder as a first operand; determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the data processing period as a second operand; the product of the first operand and the second operand of each target multiplier-adder in the data processing period is respectively determined.
10. The data processing method according to any one of claims 7 to 9, wherein the performing the data processing tasks corresponding to the multiplier-adder groups according to the data processing results corresponding to the multiplier-adder groups in each data processing cycle comprises:
for each target multiplier-adder in each multiplier-adder group, adding products obtained by the target multiplier-adder in each data processing period to obtain a sum;
and finishing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders respectively contained in each multiplier-adder group.
11. A data processing method according to any one of claims 1 to 10, wherein the data processing task comprises: a convolution processing task;
and the convolution processing tasks of different multiplier-adder groups correspond to different images to be processed.
12. A data processing apparatus, comprising: a controller; the controller is configured to:
grouping a plurality of multiplier-adders in the multiplier-adder array based on a matrix operand operation step size to obtain at least one multiplier-adder group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
13. A computer device, comprising: a controller, a memory storing machine-readable instructions executable by the controller, the controller to execute machine-readable instructions stored in the memory, the machine-readable instructions, when executed by the controller, the controller to perform the steps of the data processing method of any one of claims 1 to 11.
14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the data processing method according to any one of claims 1 to 11.
CN202110132573.XA 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium Active CN112927125B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110132573.XA CN112927125B (en) 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium
PCT/CN2021/115799 WO2022160706A1 (en) 2021-01-31 2021-08-31 Data processing method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110132573.XA CN112927125B (en) 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112927125A true CN112927125A (en) 2021-06-08
CN112927125B CN112927125B (en) 2023-06-23

Family

ID=76169016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110132573.XA Active CN112927125B (en) 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112927125B (en)
WO (1) WO2022160706A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022160706A1 (en) * 2021-01-31 2022-08-04 成都商汤科技有限公司 Data processing method and apparatus, computer device, and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827107A (en) * 2010-05-11 2010-09-08 南京大学 IEEE802.1AE protocol-based GCM high-speed encryption and decryption equipment
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
US20170060532A1 (en) * 2015-08-25 2017-03-02 Samsung Electronics Co., Ltd. Fast close path solution for a three-path fused multiply-add design
CN107301455A (en) * 2017-05-05 2017-10-27 中国科学院计算技术研究所 Mixing cube storage system and speed-up computation method for convolutional neural networks
CN109284782A (en) * 2018-09-13 2019-01-29 北京地平线机器人技术研发有限公司 Method and apparatus for detecting feature
CN110796244A (en) * 2018-08-01 2020-02-14 南京天数智芯科技有限公司 Core computing unit processor for artificial intelligence device and accelerated processing method
US20200151019A1 (en) * 2019-03-14 2020-05-14 Rednova Innovations,Inc. OPU-based CNN acceleration method and system
CN111581595A (en) * 2020-04-24 2020-08-25 科大讯飞股份有限公司 Matrix multiplication calculation method and calculation circuit
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9606803B2 (en) * 2013-07-15 2017-03-28 Texas Instruments Incorporated Highly integrated scalable, flexible DSP megamodule architecture
CN105205191B (en) * 2014-06-12 2018-10-12 济南概伦电子科技有限公司 Multi tate parallel circuit emulates
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN110659446B (en) * 2018-06-29 2022-09-23 合一智芯科技(北京)有限公司 Convolution operation control method, device and medium
CN110020678A (en) * 2019-03-25 2019-07-16 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110705687B (en) * 2019-09-05 2020-11-03 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN112927125B (en) * 2021-01-31 2023-06-23 成都商汤科技有限公司 Data processing method, device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827107A (en) * 2010-05-11 2010-09-08 南京大学 IEEE802.1AE protocol-based GCM high-speed encryption and decryption equipment
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
US20170060532A1 (en) * 2015-08-25 2017-03-02 Samsung Electronics Co., Ltd. Fast close path solution for a three-path fused multiply-add design
CN107301455A (en) * 2017-05-05 2017-10-27 中国科学院计算技术研究所 Mixing cube storage system and speed-up computation method for convolutional neural networks
CN110796244A (en) * 2018-08-01 2020-02-14 南京天数智芯科技有限公司 Core computing unit processor for artificial intelligence device and accelerated processing method
CN109284782A (en) * 2018-09-13 2019-01-29 北京地平线机器人技术研发有限公司 Method and apparatus for detecting feature
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention
US20200151019A1 (en) * 2019-03-14 2020-05-14 Rednova Innovations,Inc. OPU-based CNN acceleration method and system
CN111581595A (en) * 2020-04-24 2020-08-25 科大讯飞股份有限公司 Matrix multiplication calculation method and calculation circuit

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孟成真: "基于LSTM的时间序列预测算法的并行化研究", 《中国优秀硕士学位论文库 信息科技辑》 *
张多利等: "二维高精度MUSIC算法的高速实现", 《合肥工业大学学报(自然科学版)》 *
赵晶晶等: "IEEE802.1AE中GCM的高速硬件实现", 《电子与信息学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022160706A1 (en) * 2021-01-31 2022-08-04 成都商汤科技有限公司 Data processing method and apparatus, computer device, and storage medium

Also Published As

Publication number Publication date
WO2022160706A1 (en) 2022-08-04
CN112927125B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110050267B (en) System and method for data management
Ma et al. Scalable and modularized RTL compilation of convolutional neural networks onto FPGA
CN108205700B (en) Neural network operation device and method
EP3093757B1 (en) Multi-dimensional sliding window operation for a vector processor
CN101084483A (en) Bit serial processing element for a simd array processor
WO2014105154A1 (en) Systems, methods, and computer program products for performing mathematical operations
WO2022160704A1 (en) Image processing method and apparatus, computer device and storage medium
CN115552523A (en) Counter-based multiplication using in-memory processing
JP2018022339A (en) Calculation processor and control method of calculation processor
CN112927125A (en) Data processing method and device, computer equipment and storage medium
CN115238863A (en) Hardware acceleration method, system and application of convolutional neural network convolutional layer
CN112966729B (en) Data processing method and device, computer equipment and storage medium
He et al. A configurable SIMD architecture with explicit datapath for intelligent learning
CN115485656A (en) In-memory processing method for convolution operation
WO2023124371A1 (en) Data processing apparatus and method, and chip, computer device and storage medium
CN113657587B (en) Deformable convolution acceleration method and device based on FPGA
Geng et al. MacSim: a MAC-enabled high-performance low-power SIMD architecture
CN113327217A (en) Convolution processing method and device, computer equipment and storage medium
CN112668709A (en) Computing device and method for data reuse
CN110765413B (en) Matrix summation structure and neural network computing platform
CN111860809A (en) Method for carrying out first-layer convolution layer processing by filling image sensing chip with dummy unit
CN113867800A (en) Computing device, integrated circuit chip, board card, electronic equipment and computing method
TWI841632B (en) Method and system for spatial locality transform of matrices
TWI841631B (en) Method and processor circuit for spatial locality transform of matrices
US12045612B2 (en) Special-purpose digital-compute hardware for efficient element-wise aggregation, scaling and offset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40051173

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant