WO2022205763A1 - Data processing method and apparatus, device, and storage medium - Google Patents

Data processing method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2022205763A1
WO2022205763A1 PCT/CN2021/115555 CN2021115555W WO2022205763A1 WO 2022205763 A1 WO2022205763 A1 WO 2022205763A1 CN 2021115555 W CN2021115555 W CN 2021115555W WO 2022205763 A1 WO2022205763 A1 WO 2022205763A1
Authority
WO
WIPO (PCT)
Prior art keywords
sampling result
sampling
result
data
processing
Prior art date
Application number
PCT/CN2021/115555
Other languages
French (fr)
Chinese (zh)
Inventor
周军
周亮
常亮
赵能
Original Assignee
成都商汤科技有限公司
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都商汤科技有限公司, 电子科技大学 filed Critical 成都商汤科技有限公司
Publication of WO2022205763A1 publication Critical patent/WO2022205763A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of data processing, and in particular, to a data processing method, apparatus, device, and storage medium.
  • a processing array can be used to perform convolution processing of image data, but the phenomenon of low utilization rate of the processing array often occurs during the convolution process, thereby increasing energy consumption and reducing efficiency.
  • the present disclosure provides a data processing method, apparatus, device and storage medium to solve the deficiencies in the related art.
  • a data processing method comprising: sampling data to be processed according to a step size of a convolution operation to obtain at least one first sampling result, wherein the step size is greater than 1. Sampling the convolution kernel according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result are in one-to-one correspondence; The at least one first sampling result and the at least one second sampling result are correspondingly input to the processing array, so that the processing array outputs the processing result.
  • the sampling of the data to be processed according to the step size of the convolution operation to obtain at least one first sampling result includes: performing line sampling on the data to be processed according to the step size, Obtain at least one first row sampling result, wherein the union of the at least one first row sampling result is the to-be-processed data; perform column sampling on the to-be-processed data according to the step size to obtain at least one first row sampling result column sampling results, wherein the union of the at least one first column sampling result is the data to be processed; the intersection of each of the first row sampling results and each of the first column sampling results is determined to determine is the first sampling result.
  • the sampling of the convolution kernel according to the step size of the convolution operation to obtain at least one second sampling result includes: performing line sampling on the convolution kernel according to the step size , obtain at least one second row sampling result, wherein the union of the at least one second row sampling result is the convolution kernel; perform column sampling on the convolution kernel according to the step size to obtain at least one first row sampling result Two-column sampling results, wherein the union of the at least one second-column sampling result is the convolution kernel; the intersection of each of the second-row sampling results and each of the second-column sampling results, respectively, Determined as the second sampling result.
  • the corresponding inputting of the at least one first sampling result and the at least one second sampling result to the processing array, so that the processing array outputs the processing result includes: For each first sampling result, the first sampling result is input into the processing array, and the second sampling result corresponding to the first sampling result is input into the processing array; and the processing array is controlled according to the The first sampling result and the corresponding second sampling result determine the corresponding sub-processing result; the processing array is controlled to output the processing result according to the sub-processing result corresponding to each first sampling result.
  • the inputting the first sampling result to the processing array for each first sampling result includes: for each first sampling result, processing the first sampling result A plurality of numerical values are input into a plurality of units of the processing array, so that the relative positions of the plurality of numerical values in the plurality of units are the same as the relative positions of the plurality of numerical values in the first sampling result.
  • the processing array includes an active array, at least one overflow row and at least one overflow column distributed around the active array, wherein the active array includes a plurality of storage and processing
  • the first unit of data, the overflow row and the overflow column include a plurality of second units for storing data; the plurality of values of the first sampling result are input to the plurality of units of the processing array , including: inputting multiple numerical values of the first sampling result into multiple units of the processing array, so that the numerical values of the first row and first column in the first sampling result are input into the target unit, and the target unit It is located in the first row and first column in the plurality of first cells.
  • the controlling of the processing array to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result includes: for each of the corresponding second sampling results a weight value, control the processing array to use the value corresponding to the weight value in the first sampling result, and determine a partial sum with the weight value; control the processing array to use the value corresponding to the weight value in the corresponding second sampling result, respectively; Corresponding part and determining part result; controlling the processing array to determine the sub-processing result corresponding to the first sampling result according to at least one partial result.
  • the processing array for each weight value in the corresponding second sampling result, is controlled to use the value corresponding to the weight value in the first sampling result, and the value corresponding to the weight value in the first sampling result.
  • Determining the partial sum includes: for the first weight value in the corresponding second sampling result, controlling the processing array to use the value of the first sampling result in the unit corresponding to the initial position of the processing array and the first weight value Determine the partial sum.
  • the processing array is controlled to use the value corresponding to the weight value in the first sampling result, and the value corresponding to the weight value in the first sampling result.
  • Determining the partial sum includes: for each non-first weight value in the corresponding second sampling result, according to the first value corresponding to the non-first weight value in the first sampling result, and in the first sampling result The positional relationship of the second value corresponding to the last weight value of the non-first weight value in the first sampling result, determining the movement mode of the first sampling result, and controlling the processing array to adopt the determined movement mode moving the second numerical value to the corresponding unit; controlling the processing array to use the numerical value in the moved corresponding unit and the non-first weight value to determine the partial sum.
  • the method for acquiring the data to be processed includes: determining the number of rows and columns of the data to be processed according to the processing array, the convolution kernel, and the step size; Determine the number of overlapping rows and columns according to the convolution kernel and the step size; according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, determine the number of overlapping rows and columns of the data to be processed.
  • the data is sampled to obtain a plurality of data to be processed.
  • the data to be processed is single-channel data or one channel of multi-channel data
  • the convolution kernel is one channel of single-channel convolution kernel or multi-channel convolution kernel.
  • a data processing apparatus comprising: a controller configured to sample data to be processed according to a step size of a convolution operation to obtain at least one first sampling result, wherein, The step size is greater than 1; the convolution kernel is sampled according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result One-to-one correspondence; and correspondingly input the at least one first sampling result and the at least one second sampling result to a processing array; the processing array is configured to compare the at least one first sampling result and the at least one second sampling result At least one second sampling result is processed, and the processing result is output.
  • the controller is configured to perform line sampling on the data to be processed according to the step size, to obtain at least one first line sampling result, wherein the at least one first line
  • the union of the sampling results is the data to be processed
  • column sampling is performed on the data to be processed according to the step size to obtain at least one sampling result of the first column, wherein the union of the at least one sampling result of the first column is the data to be processed
  • the intersection of each of the first row sampling results and each of the first column sampling results is determined as the first sampling result.
  • the controller is configured to perform line sampling on the convolution kernel according to the step size to obtain at least one second line sampling result, wherein the at least one second line
  • the union of the sampling results is the convolution kernel
  • column sampling is performed on the convolution kernel according to the step size to obtain at least one second column sampling result, wherein the union of the at least one second column sampling result is the convolution kernel
  • the intersection of each sampling result of the second row and the sampling result of each second column is determined as the second sampling result.
  • the controller is configured to, for each first sampling result, input the first sampling result to the processing array, and send a second sampling result corresponding to the first sampling result input to the processing array; and the processing array is used to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result; and the corresponding sub-processing result according to each first sampling result , output the processing result.
  • the controller is configured to, for each first sampling result, input a plurality of numerical values of the first sampling result into a plurality of units of the processing array, so that the plurality of The relative positions of the values in the plurality of units are the same as the relative positions of the plurality of values in the first sampling result.
  • the processing array includes an active array, at least one overflow row and at least one overflow column distributed around the active array, wherein the active array includes a plurality of storage and processing
  • the first unit of data, the overflow row and the overflow column include a plurality of second units for storing data; the controller is used for inputting the plurality of values of the first sampling result to the processing In the plurality of units of the array, the values of the first row and the first column in the first sampling result are input into the target unit, and the target unit is located in the first row and the first column of the plurality of first units.
  • the processing array is configured to, for each weight value in the corresponding second sampling result, use a value corresponding to the weight value in the first sampling result, and use the value corresponding to the weight value in the first sampling result. determining a partial sum; determining a partial result according to the respective weight values in the corresponding second sampling result; and determining a sub-processing result corresponding to the first sampling result according to at least one partial result.
  • the processing array is configured to, for the first weight value in the corresponding second sampling result, use the value in the unit corresponding to the initial position of the first sampling result and the value in the processing array. This first weight value determines the partial sum.
  • the controller is configured to, for each non-first weight value in the corresponding second sampling result, according to the first sampling result corresponding to the non-first weight value.
  • the first numerical value, and the positional relationship in the first sampling result of the second numerical value corresponding to the last weight value of the non-first weight value in the first sampling result determine the movement mode of the first sampling result;
  • the processing array is used for moving the second numerical value to the corresponding unit in a determined moving manner; and determining the partial sum by using the numerical value in the moved corresponding unit and the non-first weight value.
  • the controller is further configured to determine the number of rows and columns of the data to be processed according to the processing array, the convolution kernel and the step size;
  • the convolution kernel and the step size determine the number of overlapping rows and columns; according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, the data of the image to be processed is processed. Sampling to obtain multiple data to be processed.
  • the data to be processed is single-channel data or one channel of multi-channel data
  • the convolution kernel is one channel of single-channel convolution kernel or multi-channel convolution kernel.
  • an electronic device the device includes a memory, a processor, and the apparatus described in the second aspect of the embodiments of the present disclosure.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method of the first aspect.
  • At least one first sampling result and at least one second sampling result are obtained by synchronously sampling the data to be processed and the convolution kernel, and there is a one-to-one correspondence between the first sampling result and the second sampling result, Further, the corresponding first sampling result and the second sampling result may be sequentially input to the processing array to obtain the processing result. Since the sampling of the data to be processed and the convolution kernel is performed based on the step size of the convolution operation, the corresponding first sampling result and the second sampling result match each other, that is, the difference between the second sampling result and the first sampling result is matched.
  • the step size of the convolution operation is 1, so that each unit can be utilized after being input to the processing array, which improves the utilization rate of the processing array, avoids waste of energy consumption, and improves the processing efficiency.
  • FIG. 1 is a flowchart of a data processing method shown in an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of data to be processed according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of at least one first sampling result shown in an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a convolution kernel shown in an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of at least one second sampling result shown in an embodiment of the present disclosure.
  • FIG. 6 is a flow chart of obtaining a processing result shown in an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a processing array shown in an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a unit in a processing array according to an embodiment of the present disclosure.
  • FIG. 9 is a flowchart of obtaining a sub-processing result shown in an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram illustrating that the first sampling result moves to the left on the processing array according to an embodiment of the present disclosure
  • FIG. 11 is a schematic diagram illustrating the upward movement of the first sampling result on the processing array according to an embodiment of the present disclosure
  • FIG. 12 is a schematic diagram illustrating that the first sampling result moves to the right on the processing array according to an embodiment of the present disclosure
  • FIG. 13 is a schematic diagram of a sampling manner of data to be processed according to an embodiment of the present disclosure
  • FIG. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • At least one embodiment of the present disclosure provides a data processing method, please refer to FIG. 1 , which shows a flow of the method, including steps S101 to S103.
  • the method can be executed by electronic equipment such as terminal equipment or server, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, Personal Digital Assistant (Personal Digital Assistant, PDA) handheld device, computing device, vehicle-mounted device, wearable device, etc.
  • the method can be implemented by the processor calling the computer-readable instructions stored in the memory.
  • the method may be performed by a server, and the server may be a local server, a cloud server, or the like.
  • step S101 the data to be processed is sampled according to the step size of the convolution operation, and at least one first sampling result is obtained, wherein the step size is greater than 1.
  • the sampling method may be downsampling, that is, selecting some data from the data to be processed to form a first sampling result.
  • One first sampling result can be obtained through one downsampling, and multiple first sampling results can be obtained through multiple downsampling.
  • a sampling result There is no data overlap between different first sampling results, and all the first sampling results can form complete data to be processed, that is, the data to be processed can be used as a first sampling result, or the data to be processed can be split. for a plurality of first sampling results.
  • the step size of the convolution operation refers to the step size of the convolution kernel moving on the data to be processed; in the convolution operation, the convolution kernel performs a calculation every time it moves.
  • sampling according to the step size refers to selecting the data to be processed according to the step size, and the result obtained is the first sampling result; the starting point of sampling is the data that has not been selected by other sampling processes.
  • the first sampling results obtained from different positions as starting points are different, and there is no overlap. For example, the data to be processed is divided into 17 sub-data, when the step size is 2, the first sub-data is used as the starting point for sampling with the step size of 2, and the steps 1, 3, 5, 7, 9, 11, 13 are obtained.
  • step S102 the convolution kernel is sampled according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result are one by one correspond.
  • the sampling method can be downsampling, that is, selecting part of the weight values from the convolution kernel to form the second sampling result.
  • One second sampling result can be obtained by one downsampling, and multiple downsampling can be obtained by multiple times.
  • the second sampling result There is no overlap between different second sampling results, and all second sampling results can form a complete convolution kernel, that is, the convolution kernel can be used as a second sampling result, or the convolution kernel can be split into a plurality of second sampling results.
  • Sampling according to the step size refers to the selection in the convolution kernel according to the step size, and the result obtained is the second sampling result; the starting point of the sampling is the weight value not selected by other sampling processes, and the different The second sampling results obtained by taking the position as the starting point are different and do not overlap.
  • the convolution kernel is divided into 9 weight values, and when the step size is 2, the first weight value is the starting point for sampling with step size 2, and the first, 3, 5, 7, and 9 weight values are obtained.
  • the second sampling result is composed of the second sampling result, and then the second weight is the starting point for sampling with a step size of 2, and the second sampling result composed of the 2nd, 4th, 6th, and 8th weight values is obtained. So far, the convolution kernel has been All are selected, so the sampling ends, and the above two second sampling results are obtained.
  • the convolution kernel performs a calculation every time it moves, so in the whole convolution operation process, there is a correspondence between the sub-data of the data to be processed and the weight value of the convolution kernel, and then the first sampling result and the first sampling result have a corresponding relationship.
  • the two sampling results also have a corresponding relationship, that is, at least one first sampling result and at least one second sampling result are in a one-to-one correspondence.
  • the first sampling result and the second sampling result corresponding to each other refer to the sub-data in the data to be processed and the weight value calculated thereon in the convolution process.
  • the matching can be performed based on the starting point position of the sampling, that is, the relative position of the starting point of the first sampling result in the data to be processed is the same as the relative position of the starting point of the second sampling result in the convolution kernel, then Confirm that the first sampling result and the second sampling result above correspond to each other.
  • the starting point is the first sampling result of the first sub-data
  • the starting point is the first sampling result of the first sub-data.
  • the second sampling results of the weight values correspond to each other, that is, the first sampling results composed of the 1st, 3rd, 5th, 7th, 9th, 11th, 13th, 15th, and 17th sub-data, and the first sampling results composed of the 1st, 3rd, 5th, 7th sub-data , corresponding to the second sampling result consisting of 9 weight values;
  • the starting point is the first sampling result of the second sub-data, which corresponds to the second sampling result whose starting point is the second weight value, that is, the second, fourth, sixth , 8, 10, 12, 14, and 16 sub-data consist of the first sampling result, which corresponds to the second sampling result consisting of the 2nd, 4th, 6th, and 8th weight values.
  • step S103 the at least one first sampling result and the at least one second sampling result are correspondingly input to the processing array, so that the processing array outputs the processing result.
  • the first pair of the first sampling result and the second sampling result corresponding to each other are input into the processing array, and then the second pair of the first sampling result and the second sampling result corresponding to each other are input into the processing array, until the last pair is corresponding to each other
  • the first sampling result and the second sampling result of are input into the processing array, so that the processing array can be controlled to output the processing result, wherein the processing result refers to the result after the data to be processed is convolved by the convolution kernel.
  • the data to be processed is the data of one channel in the single-channel data or the multi-channel data
  • the convolution kernel is the convolution kernel of one channel in the single-channel convolution kernel or the multi-channel convolution kernel.
  • At least one first sampling result and at least one second sampling result can be obtained by synchronously sampling the data to be processed and the convolution kernel, and there is a one-to-one correspondence between the first sampling result and the second sampling result, Further, the corresponding first sampling result and the second sampling result may be sequentially input to the processing array to obtain the processing result. Since the sampling of the data to be processed and the convolution kernel is performed based on the step size of the convolution operation, the corresponding first sampling result and the second sampling result match each other, that is, the difference between the second sampling result and the first sampling result is matched.
  • the step size of the convolution operation is 1, so that each unit can be utilized after being input to the processing array, which improves the utilization rate of the processing array, avoids waste of energy consumption, and improves the processing efficiency.
  • the processing array of the commonly used convolutional neural network accelerator is generally a two-dimensional connection architecture.
  • SIMD Single Instruction Multiple Data
  • a single instruction controls all units to perform the same operation. Operations (such as: shift, memory access, multiply-accumulate operations (Multiple Accumulate, MAC), etc.).
  • the step size of the convolution operation is greater than 1, the calculation results of some units of the processing array are unnecessary, which will greatly reduce the utilization of the processing array.
  • each operation of the processing array is converted into a convolution operation in which the second sampling result has a step size of 1 for the first sampling result, so that the utilization rate of the processing array reaches 100%.
  • the data to be processed may be sampled according to the step size of the convolution operation in the following manner to obtain at least one first sampling result: first, the data to be processed is row-sampled according to the step size , obtain at least one sampling result of the first row, wherein the union of the at least one sampling result of the first row is the data to be processed; next, perform column sampling on the data to be processed according to the step size to obtain at least one sampling result of the first column, wherein the union of the at least one sampling result of the first column is the data to be processed; finally, each sampling result of the first row and the sampling result of each first column are respectively The intersection of the sampling results is determined as the first sampling result.
  • the movement of the convolution kernel on the data to be processed is divided into two directions, namely the row direction and the column direction, so the corresponding relationship between the weight value in the convolution kernel and the sub-data in the data to be processed, It is divided into two dimensions: row and column.
  • the sampling results are combined in pairs (that is, each first row sampling result is combined with each first column sampling result), and the intersection is obtained to obtain multiple first samples.
  • this enables the first sampling result and the second sampling result to correspond in both row and column dimensions.
  • row sampling and column sampling are both sampled using the sampling method introduced in step S101. Since sampling is performed from two dimensions, if the step size is S, the number of first sampling results is S 2 .
  • the data to be processed is a 17*17 data block as shown in Figure 2, and the step size of the convolution operation is 2. Therefore, according to the above method of row sampling, column sampling, and finally taking the intersection, the following can be obtained:
  • the first sampling result 301 is the first, third, 5, 7, 9, 11, 13, 15 and 17 rows and the first, The intersection of columns 3, 5, 7, 9, 11, 13, 15, and 17 (9*9)
  • the first sampling result 302 is rows 1, 3, 5, 7, 9, 11, 13, 15, and 17 with The intersection of the 2, 4, 6, 8, 10, 12, 14 and 16 columns (9*8)
  • the first sampling result 303 is the 2, 4, 6, 8, 10, 12, 14 and 16 rows and the The intersection of columns 1, 3, 5, 7, 9, 11, 13, 15 and 17 (8*9)
  • the first sampling result 304 is rows 2, 4, 6, 8, 10, 12, 14 and 16 with The intersection of columns 2, 4, 6, 8, 10, 12, 14, and 16 (8*8).
  • the convolution kernel may be sampled according to the step size of the convolution operation in the following manner to obtain at least one second sampling result. sampling to obtain at least one sampling result of the second row, wherein the union of the at least one sampling result of the second row is the convolution kernel; next, perform column sampling on the convolution kernel according to the step size, Obtain at least one sampling result of the second column, wherein the union of the at least one sampling result of the second column is the convolution kernel; The intersection of the column sampling results is determined as the second sampling result.
  • the movement of the convolution kernel on the data to be processed is divided into two directions, namely the row direction and the column direction, so the corresponding relationship between the weight value in the convolution kernel and the sub-data in the data to be processed, It is divided into two dimensions: row and column.
  • the sampling results are combined in pairs (that is, the sampling results of each second row are combined with the sampling results of each second column respectively), and the intersection is obtained to obtain a plurality of second sampling results. Sampling results, so that the first sampling result and the second sampling result can correspond in both the row and column dimensions.
  • row sampling and column sampling are both sampled using the sampling method introduced in step S102. Since sampling is performed from two dimensions, if the step size is S, the number of second sampling results is S 2 .
  • the convolution kernel is a 3*3 convolution kernel as shown in Figure 4, and the step size of the convolution operation is 2. Therefore, according to the above method of row sampling, column sampling, and finally taking the intersection, we can get Four second sampling results 501 , 502 , 503 and 504 are shown in FIG.
  • the second sampling result 501 is the intersection of rows 1 and 3 and columns 1 and 3 (that is, the four weight values in the figure A, C, G, I)
  • the second sampling result 502 is the intersection of the first and third rows and the second column (that is, the two weight values B and H in the figure)
  • the second sampling result 503 is the second row and The intersection of the first and third columns (ie, the two weight values D and F in the figure)
  • the second sampling result 504 is the intersection of the second row and the second column (ie, the weight value E in the figure).
  • the four first sampling results shown in FIG. 3 and the four second sampling results shown in FIG. 5 are all obtained by sampling according to step size 2, so they can be in one-to-one correspondence.
  • the first sampling results 301 corresponds to the second sampling result 501
  • the first sampling result 302 corresponds to the second sampling result 502
  • the first sampling result 303 corresponds to the second sampling result 503
  • the first sampling result 304 corresponds to the second sampling result 504 .
  • the at least one first sampling result and the at least one second sampling result may be correspondingly input to the processing array as shown in FIG. 6 , so that the processing array outputs
  • the processing result includes steps S601 to S603.
  • step S601 for each first sampling result, the first sampling result is input into the processing array, and the second sampling result corresponding to the first sampling result is input into the processing array.
  • step S602 the processing array is controlled to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result.
  • step S603 the processing array is controlled to output a processing result according to the sub-processing result corresponding to each first sampling result.
  • step S601 and step S602 are both repeated steps (that is, repeated N times, where N is the number of first sampling results), that is, step S601 is executed for each first sampling result and its corresponding second sampling result and step S602.
  • the first first sampling result is input into the processing array, and then the first second sampling result is input into the processing array, and the processing array is controlled to obtain the first sub-processing result according to the above input; then the second Input the first sampling result into the processing array, then input the second second sampling result into the processing array, and control the processing array to obtain the second sub-processing result according to the above input; until the last sub-processing result (that is, the Nth sub-processing result) is obtained process result).
  • the first sampling results 301 may be input into the processing array first, and then the The second sampling result 501 is input into the processing array, thereby controlling the processing array to obtain the first sub-processing result according to the first sampling result 301 and the second sampling result 501; then the first sampling result 302 is input into the processing array, and the second sampling The result 502 is input into the processing array, thereby controlling the processing array to obtain a second sub-processing result according to the first sampling result 302 and the second sampling result 502; then the first sampling result 303 is input into the processing array, and the second sampling result 503 is input processing array, so as to control the processing array to obtain the third sub-processing result according to the first sampling result 303 and the second sampling result 503; finally input the first sampling result 304 to the processing array, and then input the second sampling result 504 to the processing array,
  • the control processing array obtains the
  • step S601 may be performed in the following manner: for each first sampling result, multiple values of the first sampling result are input into multiple units of the processing array, so that the multiple values are in the The relative positions of the plurality of units are the same as the relative positions of the plurality of numerical values in the first sampling result.
  • the first sampling result includes multiple rows and multiple columns of numerical values
  • the processing array includes multiple rows and multiple columns of units, each unit being used to store a numerical value.
  • the arrangement of the values in the first sampling result is exactly the same as the arrangement of the values in the processing array.
  • the processing array is a unit with multiple rows and columns in one layer
  • the first sampling result is a layer with multiple rows and columns.
  • the values of the multi-row and multi-column units and the multi-row and multi-column values are parallel to each other and correspond one-to-one.
  • the multi-row and multi-column values are overall mapped to the multi-row and multi-column units.
  • the processing array may include an active array, at least one overflow row and at least one overflow column distributed around the active array.
  • the effective array includes a plurality of first units for storing and processing data (ie, the circular unit execution array (Processing Engine, PE) in FIG. 7 ), and the overflow row and the overflow column include a plurality of The second unit PE for storing data (ie, the hexagonal unit PE in FIG. 7 ).
  • the number of rows of the first unit is greater than the number of rows of the first sampling result, or equal to the number of rows of the first sampling result, or 1 less than the number of rows of the first sampling result, and the number of columns of the first unit is greater than the number of columns of the first sampling result or equal to the number of columns of the first sampling result, or smaller than the number of columns of the first sampling result by 1, the above relationship is used to determine the data to be processed (which will be described in detail below) and the first sampling result (which has been carried out in the previous section). detailed description).
  • the four first sampling results shown in Figure 3 are input into the processing array shown in Figure 7 (including 10*10 cells, the 8*8 cells in the center are the first cells, and the 8*8 cells in the center are around, There are two rows of second cells at the top and bottom, and two columns of second cells at the left and right), the number of rows and columns of the four first sampling results are both 8 or 9, which satisfies the above relationship.
  • the connection relationship between the first unit and adjacent units is shown in Figure 8.
  • the first unit has an internal register R0, an arithmetic unit (Arithmetic and Logic Unit, ALU) and related data loading. and storage circuit module M, and each first unit is connected with a shift register file and a static random-access memory (Static Random-Access Memory, SRAM), and the shift register file has R1, R2, R3, R4, etc. Multiple shift registers.
  • the second unit compared with the first unit, has the same other structure, but does not have an arithmetic unit ALU. Adjacent cells are connected through a shift register file, and in the processing array, each cell is connected to its neighbors in all directions (ie, up, down, left, and right).
  • multiple values of the first sampling result may be input into multiple units of the processing array, so that the first row and first column of the first sampling result are The value is input into the unit located in the first row and the first column in the first unit, that is, the first value is input into the first first unit, since the storage and movement of the first sampling result in the processing array are performed in a unified unit as a whole (that is, the operation mode of Single Instruction Multiple Data (SIMD)), so after the first numerical value is located, the positioning of the entire first sampling result and the processing array is realized.
  • SIMD Single Instruction Multiple Data
  • the number of rows of the first sampling result can be less than or equal to the number of rows of the first unit, or 1 greater than the number of rows of the first unit, so at most one row of values is stored in the second unit, and in this case, more than The value of a row of , does not need to be convolved by the second sampling result, so it not only ensures the convolution operation between the first sampling result and the second sampling result, but also avoids waste of energy consumption and efficiency reduction;
  • the number of columns of the sampling result can be less than or equal to the number of columns in the first unit, or 1 greater than the number of columns in the first unit, so at most one column of values is stored in the second unit, and in this case, the extra column The value does not need to be convolved by the second sampling result, so the convolution operation between the first sampling result and the second sampling result is ensured, and waste of energy consumption and efficiency reduction are avoided.
  • step S602 may be performed as shown in FIG. 9 , including steps S901 to S903.
  • step S901 for each weight value in the corresponding second sampling result, the processing array is controlled to use a value corresponding to the weight value in the first sampling result, and determine a partial sum with the weight value.
  • step S902 the processing array is controlled to determine the partial result and the partial result corresponding to each weight value in the corresponding second sampling result respectively.
  • step S903 the processing array is controlled to determine a sub-processing result corresponding to the first sampling result according to at least one partial result.
  • step S901 is a repeated step (that is, repeated M times, where M is the number of weight values in the corresponding second sampling result), that is, step S901 is performed for each weight value of the second sampling result, so that it is possible to The partial sums corresponding to each weight value (ie the 1st to Mth partial sums) are obtained in turn.
  • the weight value can be multiplied by the corresponding value.
  • step S901 may be performed as follows: for the first weight value in the corresponding second sampling result, control the processing array to use the value of the first sampling result in the unit corresponding to the initial position of the processing array and the The first weight value determines the partial sum; for each non-first weight value in the corresponding second sampling result, according to the first value corresponding to the non-first weight value in the first sampling result, and the first sampling result The positional relationship of the second value corresponding to the last weight value of the non-first weight value in the result in the first sampling result, determining the movement mode of the first sampling result, and controlling the processing array to use the determined The moving method moves the second numerical value to the corresponding unit, and controls the processing array to use the numerical value in the moved corresponding unit and the non-first weight value to determine the partial sum.
  • the first sampling result is shifted to the left by one unit relative to the processing array. Since the movement of the first sampling result is performed in the unit of the whole (that is, the operation mode of Single Instruction Multiple Data (SIMD)), when moving, each unit sends its stored value according to the moving direction. To adjacent cells, for example, if the first sampled result is shifted one cell to the left relative to the processing array, each cell sends the stored value to the cell adjacent to its left.
  • SIMD Single Instruction Multiple Data
  • each partial sum when a partial result is obtained according to each partial sum, each partial sum may be summed to obtain a partial result.
  • the unit When executing, the unit obtains the partial sum for the first time and stores it. After each time the unit obtains the partial sum, it sums the stored partial sum, and stores the summation result as a new partial sum, and finally obtains the partial sum. result.
  • step S903 the partial results obtained by the unit for storing and processing data may be correspondingly arranged according to the positional relationship of each unit, so as to obtain sub-processing results of multiple rows and multiple columns.
  • the first sampling result 301 and the second sampling result 501 first determine the weight value A as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (1, 1) is stored in the first row, first column, first In a cell, (1, 3) is stored in the first cell of the first row and second column, (3, 1) is stored in the first cell of the second row and first column, and (3, 3) is stored in the second row In the first unit of the second column, the value of the 9th row (that is, the value of the last row) is stored in the second unit of the first row of the lower side of the 8*8 array composed of the first unit, and the value of the 9th column (that is, the value of the last column) ) is stored in the second unit of the first column on the right side of the 8*8 array composed of the first unit; then each first unit multiplies the stored data with the weight value A to obtain a partial sum, with the first row and first column Take the first unit of the first row as an example, obtain the partial
  • the process of obtaining the partial sum of the other first units will not be repeated one by one. However, it should be noted that the second unit does not perform operations, so the partial sum cannot be obtained.
  • the convolution kernel shown in Figure 4 is obtained by performing the convolution operation with a stride of 2 on the data shown in Figure 2.
  • the result is an 8*8 data array, that is, the last row of the first sampling result 301 in the convolution process is only multiplied by the weight values G and I of the second sampling result 501, without multiplying the weight values A and C.
  • the last column of the first sampling result 301 is only multiplied by the weight values C and I of the second sampling result 501, without multiplying the weight values A and G; then for the non-first weight value C, due to the weight value C is on the right side of the first weight value A, so please refer to FIG.
  • the weight values A and G of the result 501 are multiplied without multiplying the weight values C and I; then for the non-first weight value I, since the weight value I is on the lower side of the weight value C, please refer to FIG. 11 , Move the first sampling result upward by one unit relative to the processing array as a whole, that is, the shift register R1 in the shift register file of each unit sends its stored data to the shift register file of the unit above it.
  • Shift register R1 that is, (3, 3) is stored in the first cell of the first row and first column
  • (3, 5) is stored in the first cell of the first row and second column
  • (5, 3) is stored in the second In the first cell of the first column of the row
  • (5, 5) is stored in the first cell of the second row and the second column
  • the value of the first row is stored in the upper side of the 8*8 array composed of the first cell.
  • each first unit multiplies the stored data with the weight value I to obtain a partial sum
  • taking the first unit in the first row and first column as an example obtains the partial sum I*(3, 3)
  • To store take the first unit of the first row and the second column as an example, obtain the partial sum I*(3, 5), and compare it with the original stored partial sum A*(1, 3)+C*(1 , 5)
  • After adding, get the latest part and A*(1,3)+C*(1,5)+I*(3,5) for storage take the first unit of the second row and first column as an example , get the partial sum I*(5, 3), and add it to the original stored partial sum A*(3, 1)+C*(3, 3) to get the latest partial sum A*(3, 1 )+C*(3,3)+I*(5,3) for storage, taking the first unit
  • the first sampling result is shifted to the right by one unit relative to the entire processing array, that is, the shift register R1 in the shift register file of each unit sends its stored data to the shift register in the shift register file of the unit to the right of it.
  • Bit register R1 that is, (3, 1) is stored in the first cell of the first row and first column, (3, 3) is stored in the first cell of the first row and second column, and (5, 1) is stored in the second row In the first unit of the first column, (5, 3) is stored in the first unit of the second row and the second column, and the last column value is stored in the second unit of the first column on the right side of the 8*8 array composed of the first unit.
  • each first unit multiplies the stored data with the weight value G to obtain a partial sum, taking the first unit of the first row and first column as an example, obtains the partial sum G*(3, 1), and combines with After the original stored part is added with A*(1,1)+C*(1,3)+I*(3,3), the first unit of the first row and first column is obtained
  • the partial result of A*(1,1)+C*(1,3)+I*(3,3)+G*(3,1) is stored, taking the first unit of the first row and the second column as an example, Obtain the partial sum G*(3, 3) and add it to the original stored partial sum A*(1, 3)+C*(1, 5)+I*(3, 5) to get the latest part and A*(1,3)+C*(1,5)+I*(3,5)+G*(3,3) are stored as partial results, taking the first cell of the second row and first column as an example , get the partial sum G*(5,1), and add it to the original stored partial sum A*(3,1)+C
  • the convolution kernel shown in Figure 4 is based on the step size of the data shown in Figure 2. 2.
  • the result obtained by the convolution operation is an 8*8 data array, that is, the last column of the first sampling result 301 in the convolution process is only multiplied by the weight values C and I of the second sampling result 501, without the need for the weights Values A and G are multiplied.
  • the partial results of all the first units are arranged according to the positional relationship of each unit, and the sub-processing result corresponding to the first sampling result 301 is obtained.
  • the first sampling result 302 and the second sampling result 502 first determine the weight value B as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (1, 2) is stored in the first row and first column.
  • (1, 4) is stored in the first cell in the first row and second column
  • (3, 2) is stored in the first cell in the second row and first column
  • (3, 4) is stored in the second row
  • the value of the ninth row ie the last row of values
  • each first cell will store The partial sum is obtained by multiplying the data by the weight value B.
  • the partial sum B*(1, 2) is obtained and stored. Take the unit as an example, get the partial sum B*(1, 4), and store it, take the first unit in the first column of the second row as an example, get the partial sum B*(3, 2), and store it as The first unit in the second row and second column is taken as an example, and the partial sum B*(3, 4) is obtained and stored.
  • the process of obtaining the partial sum of the other first units will not be repeated one by one, but it should be noted that , the second unit does not perform operations, so the partial sum is not obtained. This is because the convolution kernel shown in Figure 4 performs the convolution operation with a stride of 2 on the data shown in Figure 2.
  • the result is 8*8 , that is, the last row of the first sampling result 302 in the convolution process is only multiplied by the weight value H of the second sampling result 502, without multiplying it by the weight value B; then for the non-first weight value H, because The weight value H is on the lower side of the weight value B, so the first sampling result is shifted up by one unit relative to the processing array as a whole, that is, the shift register R1 in the shift register file of each unit sends its stored data to it.
  • the shift register R1 in the shift register file of the upper unit that is, (3, 2) is stored in the first unit of the first row and the first column, and (3, 4) is stored in the first unit of the first row and the second column , (5, 2) is stored in the first cell of the second row and first column, (5, 4) is stored in the second row and second column of the first cell, and the value of the first row is stored in the first cell consisting of 8 *8 in the second unit of the first row on the upper side of the array; then each first unit multiplies the stored data by the weight value H to obtain a partial sum.
  • the first sampling result 303 and the second sampling result 503 first determine the weight value D as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (2, 1) is stored in the first row and first column.
  • (2, 3) is stored in the first cell in the first row and second column
  • (4, 1) is stored in the first cell in the second row and the first column
  • (4, 3) is stored in the second row
  • the value of the ninth column (that is, the last column value) is stored in the second cell of the first column on the right side of the 8*8 array composed of the first cell; then each first cell will store Multiply the data by the weight value D to obtain the partial sum, take the first unit of the first row and the first column as an example, obtain the partial sum D*(2, 1), and store it.
  • the result is 8*8
  • the data array of that is, the last column of the first sampling result 303 in the convolution process is only multiplied by the weight value F of the second sampling result 503, and does not need to be multiplied by the weight value D; then for the non-first weight value F, because The weight value F is to the right of the weight value D, so the first sampling result is shifted to the left by one unit relative to the entire processing array, that is, the shift register R1 in the shift register file of each unit sends its stored data to The shift register R1 in the shift register file of the unit on the left side, that is, (2, 3) is stored in the first unit of the first row and the first column, and (2, 5) is stored in the first row and the second column of the first unit.
  • each unit multiplies the stored data by the weight value F to obtain a partial sum. Taking the first unit of the first row and first column as an example, the partial sum is obtained.
  • the first sampling result 304 and the second sampling result 504 first determine the weight value E as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (2, 2) is stored in the first row and first column.
  • (2, 4) is stored in the first cell in the first row and second column
  • (4, 2) is stored in the first cell in the second row and first column
  • (4, 4) is stored in the second row In the second column and the first unit; then each unit multiplies the stored data with the weight value E to obtain a partial sum.
  • the partial result of the unit E*(2, 2) is stored.
  • the partial sum E*(2, 4) is obtained as the partial result, and stored, and the second row Taking the first unit of the first column as an example, the partial sum E*(4, 2) is obtained as a partial result and stored.
  • the partial sum E*(4 , 4) is obtained as a partial result, and store it, and the process of other first units to obtain the partial sum will not be repeated one by one; finally, the partial results of all the first units are arranged according to the positional relationship of each unit, and the first sampling result is obtained 304 corresponds to the sub-processing result.
  • step S603 may be performed in the following manner: summing up multiple sub-processing results to obtain a processing result. Since the sub-processing results are partial results with multiple rows and multiple columns, and the number of rows and columns of each sub-processing result is equal (this is because the number of rows and columns of the first unit is equal), the partial results of the corresponding positions are added, and the The sum obtained at each position is used as the processing result, that is, the sum of each partial sum obtained by the first unit is performed to obtain the value of the unit, and the value of each unit constitutes the processing result.
  • the results are added to obtain the value of the corresponding position of the unit in the processing result, namely A*(3,1)+C*(3,3)+I*(5,3)+G*(5,1)+B* (3,2)+H*(5,2)+D*(4,1)+F*(4,3)+E*(4,2); the first unit of the second row, second column and the first unit is obtained in total Four partial results, add these four partial results to obtain the value of the corresponding position of the unit in the processing result, that is, A*(3,3)+C*(3,5)+I*(5,5)+ G*(5,3)+B*(3,4)+H*(5,4)+D*(4,3)+F*(4,5)+E*(4,4).
  • the data to be processed is obtained according to the data of the image.
  • the data to be processed can be obtained in the following manner: first, according to the The processing array, the convolution kernel and the step size are used to determine the number of rows and columns of the data to be processed; next, according to the convolution kernel and the step size, the number of overlapping rows and columns is determined Finally, according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, the data of the image to be processed is sampled to obtain a plurality of data to be processed.
  • the difference between the number of rows of the convolution kernel and the step size can be used as the number of overlapping rows, and the difference between the number of columns of the convolution kernel and the step size can be used as the number of overlapping columns.
  • the number of overlapping rows and columns can also be equal, and the number of overlapping rows and columns P can be determined in the following ways:
  • K is the number of rows of the convolution kernel (the number of rows and columns are equal), and S is the step size of the convolution operation. In addition, K is greater than or equal to S.
  • the number of rows of the data to be processed may be determined by calculating the product of the step size and the number of rows of the first unit, and the sum of the number of overlapping rows.
  • the number of columns of the data to be processed may be determined by calculating the product of the step size and the number of columns of the first unit and the sum of the number of overlapping columns.
  • the number of rows and columns of the data to be processed can be equal, and the number of rows and columns L of the data to be processed can be determined in the following manner:
  • S is the step size of the convolution operation
  • a is the number of rows of the first unit (the number of rows and columns are equal).
  • a sampling frame of L*L is placed in the upper left corner of the data of the image to be processed, and the data in the sampling frame is taken as the first data to be processed, Then move the sampling frame to the right by L-P, take the data in the sampling frame as the second data to be processed, move L-P to the right and then sample again until the sampling frame cannot be moved to the right by L-P, and then start from the sampling frame in the upper left corner. Move the position down by L-P for sampling, and then repeat the process of sampling the first line. After the sampling of the second line is over, continue to move L-P down until it can no longer move down L-P, and after each move, a new line Perform sampling consistent with the first row.
  • a data processing apparatus comprising: a controller configured to sample data to be processed according to a step size of a convolution operation to obtain at least one first sampling result, wherein, The step size is greater than 1; the convolution kernel is sampled according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result One-to-one correspondence; and correspondingly input the at least one first sampling result and the at least one second sampling result to a processing array; the processing array is configured to compare the at least one first sampling result and the at least one second sampling result At least one second sampling result is processed, and the processing result is output.
  • the controller is configured to perform line sampling on the data to be processed according to the step size to obtain at least one first line sampling result, wherein the at least one first line sampling result
  • the union of the data to be processed is the data to be processed; the data to be processed is subjected to column sampling according to the step size to obtain at least one sampling result of the first column, wherein the union of the at least one sampling result of the first column is the The data to be processed is determined; the intersection of each sampling result of the first row and the sampling result of each first column is determined as the first sampling result.
  • the controller is configured to perform line sampling on the convolution kernel according to the step size to obtain at least one second line sampling result, wherein the at least one second line sampling result
  • the union of the convolution kernel is the convolution kernel
  • the column sampling is performed on the convolution kernel according to the step size to obtain at least one second-column sampling result, wherein the union of the at least one second-column sampling result is the The convolution kernel; respectively, the intersection of each of the second row sampling results and each of the second column sampling results is determined as the second sampling result.
  • the controller is configured to, for each first sampling result, input the first sampling result into the processing array, and input a second sampling result corresponding to the first sampling result into the processing array the processing array; and the processing array is used to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result; and output the corresponding sub-processing result according to each first sampling result process result.
  • the controller is configured to, for each first sampling result, input multiple values of the first sampling result into multiple units of the processing array, so that the multiple values are in the The relative positions of the plurality of units are the same as the relative positions of the plurality of values in the first sampling result.
  • the processing array includes an active array, at least one overflow row and at least one overflow column distributed around the active array, wherein the active array includes a plurality of The first unit, the overflow row and the overflow column include a plurality of second units for storing data; the controller is used for inputting the plurality of values of the first sampling result to the processing array In the plurality of units, the values of the first row and the first column in the first sampling result are input into the target unit, and the target unit is located in the first row and the first column in the plurality of first units.
  • the processing array is configured to, for each weight value in the corresponding second sampling result, use a numerical value corresponding to the weight value in the first sampling result, and use the value corresponding to the weight value in the weight value determination part and; determining a partial result according to the respective weight values in the corresponding second sampling result and determining a partial result; and determining a sub-processing result corresponding to the first sampling result according to at least one partial result.
  • the processing array is configured to, for the first weight value in the corresponding second sampling result, use the value of the first sampling result in the unit corresponding to the initial position of the processing array and the first weight value.
  • a weight value determines the partial sum.
  • the controller is configured to, for each non-first weight value in the corresponding second sampling result, according to the first sampling result corresponding to the non-first weight value the numerical value, and the positional relationship in the first sampling result of the second numerical value in the first sampling result corresponding to the previous weight value that is not the first weight value, to determine the movement mode of the first sampling result; the The processing array is used for moving the second numerical value to the corresponding unit by using the determined moving mode; and determining the partial sum by using the numerical value in the moved corresponding unit and the non-first weight value.
  • the controller is further configured to determine the number of rows and columns of the data to be processed according to the processing array, the convolution kernel and the step size; according to the volume
  • the accumulation kernel and the step size determine the number of overlapping rows and columns; according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, the data of the image to be processed is sampled, Get multiple pending data.
  • the data to be processed is single-channel data or one channel of multi-channel data
  • the convolution kernel is one channel of single-channel convolution kernel or multi-channel convolution kernel.
  • the data processing apparatus may include a chip, an AI chip, and the like.
  • At least one embodiment of the present disclosure provides an electronic device. Please refer to FIG. 14 , which shows the structure of the device.
  • the device includes a memory, a processor, and the data processing provided by the embodiment of the present disclosure.
  • the memory is used to store computer instructions executable on a processor for processing data based on the method of the first aspect when executing the computer instructions.
  • At least one embodiment of the present disclosure provides a computer-readable storage medium having a computer program stored thereon, the program implementing the method of the first aspect when executed by a processor.
  • first and second are used for descriptive purposes only, and should not be construed as indicating or implying relative importance.
  • the term “plurality” refers to two or more, unless expressly limited otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Complex Calculations (AREA)
  • Image Processing (AREA)

Abstract

The present disclosure relates to a data processing method and apparatus, a device and a storage medium. The data processing method comprises: sampling data to be processed according to the step size of a convolution operation to obtain at least one first sampling result, wherein the step size is greater than 1; sampling a convolution kernel according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result correspond one to one; and correspondingly inputting the at least one first sampling result and the at least one second sampling result to a processing array, so that the processing array outputs a processing result.

Description

一种数据处理方法、装置、设备及存储介质A data processing method, device, equipment and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求于2021年3月31日提交的、申请号为202110352221.5的中国专利公开的优先权,该中国专利公开的全部内容以引用的方式并入本文中。The present disclosure claims priority to Chinese Patent Publication No. 202110352221.5 filed on March 31, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及数据处理技术领域,尤其涉及一种数据处理方法、装置、设备及存储介质。The present disclosure relates to the technical field of data processing, and in particular, to a data processing method, apparatus, device, and storage medium.
背景技术Background technique
随着人工智能技术的发展,图像可以在多方面进行自动处理,降低了人工成本,提高了效率和准确率。例如,可以采用处理阵列进行图像数据的卷积处理,但是卷积过程中经常出现处理阵列的利用率低的现象,从而增加了能耗,降低了效率。With the development of artificial intelligence technology, images can be automatically processed in many aspects, reducing labor costs and improving efficiency and accuracy. For example, a processing array can be used to perform convolution processing of image data, but the phenomenon of low utilization rate of the processing array often occurs during the convolution process, thereby increasing energy consumption and reducing efficiency.
发明内容SUMMARY OF THE INVENTION
本公开提供一种数据处理方法、装置、设备及存储介质,以解决相关技术中的缺陷。The present disclosure provides a data processing method, apparatus, device and storage medium to solve the deficiencies in the related art.
根据本公开实施例的第一方面,提供一种数据处理方法,所述方法包括:根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,其中,所述步长大于1;根据所述卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,其中,所述至少一个第一采样结果和所述至少一个第二采样结果一一对应;将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列,以使所述处理阵列输出处理结果。According to a first aspect of the embodiments of the present disclosure, there is provided a data processing method, the method comprising: sampling data to be processed according to a step size of a convolution operation to obtain at least one first sampling result, wherein the step size is greater than 1. Sampling the convolution kernel according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result are in one-to-one correspondence; The at least one first sampling result and the at least one second sampling result are correspondingly input to the processing array, so that the processing array outputs the processing result.
结合本公开提供的任一实施方式,所述根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,包括:按照所述步长对所述待处理数据进行行采样,得到至少一个第一行采样结果,其中,所述至少一个第一行采样结果的并集为所述待处理数据;按照所述步长对所述待处理数据进行列采样,得到至少一个第一列采样结果,其中,所述至少一个第一列采样结果的并集为所述待处理数据;分别将每个所述第一行采样结果和每个所述第一列采样结果的交集,确定为第一采样结果。With reference to any of the embodiments provided in the present disclosure, the sampling of the data to be processed according to the step size of the convolution operation to obtain at least one first sampling result includes: performing line sampling on the data to be processed according to the step size, Obtain at least one first row sampling result, wherein the union of the at least one first row sampling result is the to-be-processed data; perform column sampling on the to-be-processed data according to the step size to obtain at least one first row sampling result column sampling results, wherein the union of the at least one first column sampling result is the data to be processed; the intersection of each of the first row sampling results and each of the first column sampling results is determined to determine is the first sampling result.
结合本公开提供的任一实施方式,所述根据卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,包括:按照所述步长对所述卷积核进行行采样,得到至少一个第二行采样结果,其中,所述至少一个第二行采样结果的并集为所述卷积核;按照所述步长对所述卷积核进行列采样,得到至少一个第二列采样结果,其中,所述至少一个第二列采样结果的并集为所述卷积核;分别将每个所述第二行采样结果和每个所述第二列采样结果的交集,确定为第二采样结果。With reference to any of the embodiments provided in the present disclosure, the sampling of the convolution kernel according to the step size of the convolution operation to obtain at least one second sampling result includes: performing line sampling on the convolution kernel according to the step size , obtain at least one second row sampling result, wherein the union of the at least one second row sampling result is the convolution kernel; perform column sampling on the convolution kernel according to the step size to obtain at least one first row sampling result Two-column sampling results, wherein the union of the at least one second-column sampling result is the convolution kernel; the intersection of each of the second-row sampling results and each of the second-column sampling results, respectively, Determined as the second sampling result.
结合本公开提供的任一实施方式,所述将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列,以使所述处理阵列输出处理结果,包括:针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,并将与该第一采样结果对应的第二采样结果输入至所述处理阵列;并控制所述处理阵列根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果;控制所述处理阵列根据每个第一采样结果分别对应的子处理结果,输出处理结果。With reference to any implementation manner provided in the present disclosure, the corresponding inputting of the at least one first sampling result and the at least one second sampling result to the processing array, so that the processing array outputs the processing result, includes: For each first sampling result, the first sampling result is input into the processing array, and the second sampling result corresponding to the first sampling result is input into the processing array; and the processing array is controlled according to the The first sampling result and the corresponding second sampling result determine the corresponding sub-processing result; the processing array is controlled to output the processing result according to the sub-processing result corresponding to each first sampling result.
结合本公开提供的任一实施方式,所述针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,包括:针对每个第一采样结果,将该第一采样结果的多个数值输入至所述处理阵列的多个单元中,使得所述多个数值在所述多个单元中的相对位置与所述多个数值在该第一采样结果中的相对位置相同。With reference to any of the implementation manners provided in the present disclosure, the inputting the first sampling result to the processing array for each first sampling result includes: for each first sampling result, processing the first sampling result A plurality of numerical values are input into a plurality of units of the processing array, so that the relative positions of the plurality of numerical values in the plurality of units are the same as the relative positions of the plurality of numerical values in the first sampling result.
结合本公开提供的任一实施方式,所述处理阵列包括有效阵列、分布在所述有效阵列周围的至少一个溢出行和至少一个溢出列,其中,所述有效阵列包括多个用于存储和处理数据的第一单元,所述溢出行和所述溢出列包括多个用于存储数据的第二单元;所述将该第一采样结果的多个数值,输入至所述处理阵列的多个单元中,包括:将该第一采样结果的多个数值,输入至所述处理阵列的多个单元中,使得该第一采样结果中首行首列的数值输入至目标单元中,所述目标单元在所述多个第一单元中位于首行首列。In conjunction with any of the embodiments provided in the present disclosure, the processing array includes an active array, at least one overflow row and at least one overflow column distributed around the active array, wherein the active array includes a plurality of storage and processing The first unit of data, the overflow row and the overflow column include a plurality of second units for storing data; the plurality of values of the first sampling result are input to the plurality of units of the processing array , including: inputting multiple numerical values of the first sampling result into multiple units of the processing array, so that the numerical values of the first row and first column in the first sampling result are input into the target unit, and the target unit It is located in the first row and first column in the plurality of first cells.
结合本公开提供的任一实施方式,所述控制所述处理阵列根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果,包括:针对对应的第二采样结果中的每个权重值,控制所述处理阵列采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和;控制所述处理阵列根据该对应的第二采样结果中各权重值分别对应的部分和确定部分结果;控制所述处理阵列根据至少一个部分结果,确定该第一采样结果对应的子处理结果。With reference to any of the embodiments provided in the present disclosure, the controlling of the processing array to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result includes: for each of the corresponding second sampling results a weight value, control the processing array to use the value corresponding to the weight value in the first sampling result, and determine a partial sum with the weight value; control the processing array to use the value corresponding to the weight value in the corresponding second sampling result, respectively; Corresponding part and determining part result; controlling the processing array to determine the sub-processing result corresponding to the first sampling result according to at least one partial result.
结合本公开提供的任一实施方式,所述针对对应的第二采样结果中的每个权重值,控制所述处理阵列采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和,包括:针对对应的第二采样结果中的首个权重值,控制所述处理阵列采用该第一采样结果在所述处理阵列的初始位置对应单元中的数值与该首个权重值确定部分和。With reference to any of the implementation manners provided in the present disclosure, for each weight value in the corresponding second sampling result, the processing array is controlled to use the value corresponding to the weight value in the first sampling result, and the value corresponding to the weight value in the first sampling result. Determining the partial sum includes: for the first weight value in the corresponding second sampling result, controlling the processing array to use the value of the first sampling result in the unit corresponding to the initial position of the processing array and the first weight value Determine the partial sum.
结合本公开提供的任一实施方式,所述针对对应的第二采样结果中的每个权重值,控制所述处理阵列采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和,包括:针对对应的第二采样结果中的每个非首个权重值,根据该第一采样结果中与该非首个权重值对应的第一数值,以及该第一采样结果中与该非首个权重值的上一个权重值对应的第二数值在该第一采样结果中的位置关系,确定所述第一采样结果的移动方式,并控制所述处理阵列采用确定的移动方式将所述第二数值移动至对应单元;控制所述处理阵列采用移动后的对应单元中的数值与该非第一个权重值确定部分和。With reference to any of the implementation manners provided in the present disclosure, for each weight value in the corresponding second sampling result, the processing array is controlled to use the value corresponding to the weight value in the first sampling result, and the value corresponding to the weight value in the first sampling result. Determining the partial sum includes: for each non-first weight value in the corresponding second sampling result, according to the first value corresponding to the non-first weight value in the first sampling result, and in the first sampling result The positional relationship of the second value corresponding to the last weight value of the non-first weight value in the first sampling result, determining the movement mode of the first sampling result, and controlling the processing array to adopt the determined movement mode moving the second numerical value to the corresponding unit; controlling the processing array to use the numerical value in the moved corresponding unit and the non-first weight value to determine the partial sum.
结合本公开提供的任一实施方式,所述待处理数据的获取方式包括:根据所述处理阵列、所述卷积核以及所述步长,确定所述待处理数据的行数与列数;根据所述卷积核以及所述步长,确定重叠行数与重叠列数;根据所述待处理数据的行数与列数、所述重叠行数以及所述重叠列数,对待处理图像的数据进行采样,得到多个待处理数据。With reference to any of the embodiments provided in the present disclosure, the method for acquiring the data to be processed includes: determining the number of rows and columns of the data to be processed according to the processing array, the convolution kernel, and the step size; Determine the number of overlapping rows and columns according to the convolution kernel and the step size; according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, determine the number of overlapping rows and columns of the data to be processed. The data is sampled to obtain a plurality of data to be processed.
结合本公开提供的任一实施方式,所述待处理数据为单通道数据或多通道数据中的一个通道,所述卷积核为单通道卷积核或多通道卷积核中的一个通道。With reference to any embodiment provided in the present disclosure, the data to be processed is single-channel data or one channel of multi-channel data, and the convolution kernel is one channel of single-channel convolution kernel or multi-channel convolution kernel.
根据本公开实施例的第二方面,提供一种数据处理装置,所述装置包括:控制器,用于根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,其中,所述步长大于1;根据所述卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,其中,所述至少一个第一采样结果和所述至少一个第二采样结果一一对应;以及将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列;所述处理阵列,用于对所述至少一个第一采样结果和所述至少一个第二采样结果进行处理,并输出处理结果。According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus, the apparatus comprising: a controller configured to sample data to be processed according to a step size of a convolution operation to obtain at least one first sampling result, wherein, The step size is greater than 1; the convolution kernel is sampled according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result One-to-one correspondence; and correspondingly input the at least one first sampling result and the at least one second sampling result to a processing array; the processing array is configured to compare the at least one first sampling result and the at least one second sampling result At least one second sampling result is processed, and the processing result is output.
结合本公开提供的任一实施方式,所述控制器,用于按照所述步长对所述待处理数据进行行采样,得到至少一个第一行采样结果,其中,所述至少一个第一行采样结果的并集为所述待处理数据;按照所述步长对所述待处理数据进行列采样,得到至少一个第一列采样结果,其中,所述至少一个第一列采样结果的并集为所述待处理数据;分别将每个所述第一行采样结果和每个所述第一列采样结果的交集,确定为第一采样结果。With reference to any implementation manner provided in the present disclosure, the controller is configured to perform line sampling on the data to be processed according to the step size, to obtain at least one first line sampling result, wherein the at least one first line The union of the sampling results is the data to be processed; column sampling is performed on the data to be processed according to the step size to obtain at least one sampling result of the first column, wherein the union of the at least one sampling result of the first column is the data to be processed; the intersection of each of the first row sampling results and each of the first column sampling results is determined as the first sampling result.
结合本公开提供的任一实施方式,所述控制器,用于按照所述步长对所述卷积核进行行采样,得到至少一个第二行采样结果,其中,所述至少一个第二行采样结果的并集 为所述卷积核;按照所述步长对所述卷积核进行列采样,得到至少一个第二列采样结果,其中,所述至少一个第二列采样结果的并集为所述卷积核;分别将每个所述第二行采样结果和每个所述第二列采样结果的交集,确定为第二采样结果。With reference to any implementation manner provided by the present disclosure, the controller is configured to perform line sampling on the convolution kernel according to the step size to obtain at least one second line sampling result, wherein the at least one second line The union of the sampling results is the convolution kernel; column sampling is performed on the convolution kernel according to the step size to obtain at least one second column sampling result, wherein the union of the at least one second column sampling result is the convolution kernel; the intersection of each sampling result of the second row and the sampling result of each second column is determined as the second sampling result.
结合本公开提供的任一实施方式,所述控制器用于针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,并将与该第一采样结果对应的第二采样结果输入至所述处理阵列;并所述处理阵列,用于根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果;以及根据每个第一采样结果分别对应的子处理结果,输出处理结果。With reference to any of the embodiments provided in the present disclosure, the controller is configured to, for each first sampling result, input the first sampling result to the processing array, and send a second sampling result corresponding to the first sampling result input to the processing array; and the processing array is used to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result; and the corresponding sub-processing result according to each first sampling result , output the processing result.
结合本公开提供的任一实施方式,所述控制器用于针对每个第一采样结果,将该第一采样结果的多个数值输入至所述处理阵列的多个单元中,使得所述多个数值在所述多个单元中的相对位置与所述多个数值在该第一采样结果中的相对位置相同。With reference to any of the embodiments provided in the present disclosure, the controller is configured to, for each first sampling result, input a plurality of numerical values of the first sampling result into a plurality of units of the processing array, so that the plurality of The relative positions of the values in the plurality of units are the same as the relative positions of the plurality of values in the first sampling result.
结合本公开提供的任一实施方式,所述处理阵列包括有效阵列、分布在所述有效阵列周围的至少一个溢出行和至少一个溢出列,其中,所述有效阵列包括多个用于存储和处理数据的第一单元,所述溢出行和所述溢出列包括多个用于存储数据的第二单元;所述控制器,用于将该第一采样结果的多个数值,输入至所述处理阵列的多个单元中,使得该第一采样结果中首行首列的数值输入至目标单元中,所述目标单元在所述多个第一单元中位于首行首列。In conjunction with any of the embodiments provided in the present disclosure, the processing array includes an active array, at least one overflow row and at least one overflow column distributed around the active array, wherein the active array includes a plurality of storage and processing The first unit of data, the overflow row and the overflow column include a plurality of second units for storing data; the controller is used for inputting the plurality of values of the first sampling result to the processing In the plurality of units of the array, the values of the first row and the first column in the first sampling result are input into the target unit, and the target unit is located in the first row and the first column of the plurality of first units.
结合本公开提供的任一实施方式,所述处理阵列,用于针对对应的第二采样结果中的每个权重值,采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和;根据该对应的第二采样结果中各权重值分别对应的部分和确定部分结果;以及根据至少一个部分结果,确定该第一采样结果对应的子处理结果。With reference to any implementation manner provided by the present disclosure, the processing array is configured to, for each weight value in the corresponding second sampling result, use a value corresponding to the weight value in the first sampling result, and use the value corresponding to the weight value in the first sampling result. determining a partial sum; determining a partial result according to the respective weight values in the corresponding second sampling result; and determining a sub-processing result corresponding to the first sampling result according to at least one partial result.
结合本公开提供的任一实施方式,所述处理阵列用于针对对应的第二采样结果中的首个权重值,采用该第一采样结果在所述处理阵列的初始位置对应单元中的数值与该首个权重值确定部分和。With reference to any of the implementation manners provided in the present disclosure, the processing array is configured to, for the first weight value in the corresponding second sampling result, use the value in the unit corresponding to the initial position of the first sampling result and the value in the processing array. This first weight value determines the partial sum.
结合本公开提供的任一实施方式,所述控制器,用于针对对应的第二采样结果中的每个非首个权重值,根据该第一采样结果中与该非首个权重值对应的第一数值,以及该第一采样结果中与该非首个权重值的上一个权重值对应的第二数值在该第一采样结果中的位置关系,确定所述第一采样结果的移动方式;所述处理阵列,用于采用确定的移动方式将所述第二数值移动至对应单元;并采用移动后的对应单元中的数值与该非首个权重值确定部分和。With reference to any embodiment provided in the present disclosure, the controller is configured to, for each non-first weight value in the corresponding second sampling result, according to the first sampling result corresponding to the non-first weight value. The first numerical value, and the positional relationship in the first sampling result of the second numerical value corresponding to the last weight value of the non-first weight value in the first sampling result, determine the movement mode of the first sampling result; The processing array is used for moving the second numerical value to the corresponding unit in a determined moving manner; and determining the partial sum by using the numerical value in the moved corresponding unit and the non-first weight value.
结合本公开提供的任一实施方式,所述控制器,还用于根据所述处理阵列、所述卷积核以及所述步长,确定所述待处理数据的行数与列数;根据所述卷积核以及所述步长,确定重叠行数与重叠列数;根据所述待处理数据的行数与列数、所述重叠行数以及所述重叠列数,对待处理图像的数据进行采样,得到多个待处理数据。With reference to any implementation manner provided in the present disclosure, the controller is further configured to determine the number of rows and columns of the data to be processed according to the processing array, the convolution kernel and the step size; The convolution kernel and the step size determine the number of overlapping rows and columns; according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, the data of the image to be processed is processed. Sampling to obtain multiple data to be processed.
结合本公开提供的任一实施方式,所述待处理数据为单通道数据或多通道数据中的一个通道,所述卷积核为单通道卷积核或多通道卷积核中的一个通道。With reference to any embodiment provided in the present disclosure, the data to be processed is single-channel data or one channel of multi-channel data, and the convolution kernel is one channel of single-channel convolution kernel or multi-channel convolution kernel.
根据本公开实施例的第三方面,提供一种电子设备,所述设备包括存储器、处理器,以及本公开实施例第二方面所述的装置。According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, the device includes a memory, a processor, and the apparatus described in the second aspect of the embodiments of the present disclosure.
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面所述的方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method of the first aspect.
根据上述实施例可知,通过对待处理数据和卷积核进行同步的采样,以得到至少一个第一采样结果和至少一个第二采样结果,而且第一采样结果和第二采样结果间一一对应,进而可以依次将对应的第一采样结果和第二采样结果输入至处理阵列,以得到处理 结果。由于对待处理数据和卷积核的采样都是基于卷积运算的步长进行的,因此对应的第一采样结果和第二采样结果之间相互匹配,即第二采样结果对于第一采样结果的卷积运算的步长为1,进而当输入至处理阵列后能够使每个单元均能够被利用,提高了处理阵列的利用率,避免了能耗浪费,而且提高了处理效率。According to the above embodiment, at least one first sampling result and at least one second sampling result are obtained by synchronously sampling the data to be processed and the convolution kernel, and there is a one-to-one correspondence between the first sampling result and the second sampling result, Further, the corresponding first sampling result and the second sampling result may be sequentially input to the processing array to obtain the processing result. Since the sampling of the data to be processed and the convolution kernel is performed based on the step size of the convolution operation, the corresponding first sampling result and the second sampling result match each other, that is, the difference between the second sampling result and the first sampling result is matched. The step size of the convolution operation is 1, so that each unit can be utilized after being input to the processing array, which improves the utilization rate of the processing array, avoids waste of energy consumption, and improves the processing efficiency.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
图1是本公开实施例示出的数据处理方法的流程图;1 is a flowchart of a data processing method shown in an embodiment of the present disclosure;
图2是本公开实施例示出的待处理数据的示意图;FIG. 2 is a schematic diagram of data to be processed according to an embodiment of the present disclosure;
图3是本公开实施例示出的至少一个第一采样结果的示意图;3 is a schematic diagram of at least one first sampling result shown in an embodiment of the present disclosure;
图4是本公开实施例示出的卷积核的示意图;4 is a schematic diagram of a convolution kernel shown in an embodiment of the present disclosure;
图5是本公开实施例示出的至少一个第二采样结果的示意图;5 is a schematic diagram of at least one second sampling result shown in an embodiment of the present disclosure;
图6是本公开实施例示出的得出处理结果的流程图;6 is a flow chart of obtaining a processing result shown in an embodiment of the present disclosure;
图7是本公开实施例示出的处理阵列的结构示意图;FIG. 7 is a schematic structural diagram of a processing array shown in an embodiment of the present disclosure;
图8是本公开实施例示出的处理阵列中单元的结构示意图;8 is a schematic structural diagram of a unit in a processing array according to an embodiment of the present disclosure;
图9是本公开实施例示出的得出子处理结果的流程图;FIG. 9 is a flowchart of obtaining a sub-processing result shown in an embodiment of the present disclosure;
图10是本公开实施例示出的第一采样结果在处理阵列上向左移动的示意图;FIG. 10 is a schematic diagram illustrating that the first sampling result moves to the left on the processing array according to an embodiment of the present disclosure;
图11是本公开实施例示出的第一采样结果在处理阵列上向上移动的示意图;FIG. 11 is a schematic diagram illustrating the upward movement of the first sampling result on the processing array according to an embodiment of the present disclosure;
图12是本公开实施例示出的第一采样结果在处理阵列上向右移动的示意图;FIG. 12 is a schematic diagram illustrating that the first sampling result moves to the right on the processing array according to an embodiment of the present disclosure;
图13是本公开实施例示出的待处理数据的采样方式的示意图;13 is a schematic diagram of a sampling manner of data to be processed according to an embodiment of the present disclosure;
图14是本公开实施例示出的电子设备的结构示意图。FIG. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."
第一方面,本公开至少一个实施例提供了一种数据处理方法,请参照附图1,其示 出了该方法的流程,包括步骤S101至步骤S103。In a first aspect, at least one embodiment of the present disclosure provides a data processing method, please refer to FIG. 1 , which shows a flow of the method, including steps S101 to S103.
其中,该方法可以由终端设备或服务器等电子设备执行,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)手持设备、计算设备、车载设备、可穿戴设备等,该方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。或者,可以通过服务器执行该方法,服务器可以为本地服务器、云端服务器等。Wherein, the method can be executed by electronic equipment such as terminal equipment or server, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, Personal Digital Assistant (Personal Digital Assistant, PDA) handheld device, computing device, vehicle-mounted device, wearable device, etc., the method can be implemented by the processor calling the computer-readable instructions stored in the memory. Alternatively, the method may be performed by a server, and the server may be a local server, a cloud server, or the like.
在步骤S101中,根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,其中,所述步长大于1。In step S101, the data to be processed is sampled according to the step size of the convolution operation, and at least one first sampling result is obtained, wherein the step size is greater than 1.
其中,采样的方式可以是下采样,也就是从待处理数据中挑选部分数据,组成第一采样结果,通过一次下采样可以得到一个第一采样结果,通过多次下采样则可以得到多个第一采样结果。不同的第一采样结果间不存在数据重叠,而全部的第一采样结果可以组成完整的待处理数据,也就是说,可以将待处理数据作为一个第一采样结果,或将待处理数据拆分为多个第一采样结果。The sampling method may be downsampling, that is, selecting some data from the data to be processed to form a first sampling result. One first sampling result can be obtained through one downsampling, and multiple first sampling results can be obtained through multiple downsampling. A sampling result. There is no data overlap between different first sampling results, and all the first sampling results can form complete data to be processed, that is, the data to be processed can be used as a first sampling result, or the data to be processed can be split. for a plurality of first sampling results.
卷积运算的步长指的是,卷积核在待处理数据上移动的步长;卷积运算中,卷积核每移动一步进行一次计算。同理,根据步长进行采样,指的是按照步长在待处理数据中进行挑选,挑选所得到的结果即为第一采样结果;采样的起点是未被其他采样过程选择的数据,而以不同的位置为起点所得到的第一采样结果不同,且不存在重叠。例如,待处理数据分为17个子数据,步长为2时,首先以第一个子数据为起点进行步长为2的采样,得到由第1、3、5、7、9、11、13、15、17个子数据组成的第一采样结果,再以第二个子数据为起点进行步长为2的采样,得到由第2、4、6、8、10、12、14、16个子数据组成的第一采样结果,至此待处理数据已被全部选择,因此采样结束,得到上述两个第一采样结果。The step size of the convolution operation refers to the step size of the convolution kernel moving on the data to be processed; in the convolution operation, the convolution kernel performs a calculation every time it moves. Similarly, sampling according to the step size refers to selecting the data to be processed according to the step size, and the result obtained is the first sampling result; the starting point of sampling is the data that has not been selected by other sampling processes. The first sampling results obtained from different positions as starting points are different, and there is no overlap. For example, the data to be processed is divided into 17 sub-data, when the step size is 2, the first sub-data is used as the starting point for sampling with the step size of 2, and the steps 1, 3, 5, 7, 9, 11, 13 are obtained. , 15, 17 sub-data composed of the first sampling result, and then take the second sub-data as the starting point for sampling with a step size of 2, and obtain the second, 4, 6, 8, 10, 12, 14, 16 sub-data composed of The first sampling result of , so far all the data to be processed have been selected, so the sampling ends, and the above two first sampling results are obtained.
在步骤S102中,根据所述卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,其中,所述至少一个第一采样结果和所述至少一个第二采样结果一一对应。In step S102, the convolution kernel is sampled according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result are one by one correspond.
其中,采样的方式可以是下采样,也就是从卷积核中挑选部分权重值,组成第二采样结果,通过一次下采样可以得到一个第二采样结果,通过多次下采样则可以得到多个第二采样结果。不同的第二采样结果间不存在重叠,而全部的第二采样结果可以组成完整的卷积核,也就是说,可以将卷积核作为一个第二采样结果,或将卷积核拆分为多个第二采样结果。Among them, the sampling method can be downsampling, that is, selecting part of the weight values from the convolution kernel to form the second sampling result. One second sampling result can be obtained by one downsampling, and multiple downsampling can be obtained by multiple times. The second sampling result. There is no overlap between different second sampling results, and all second sampling results can form a complete convolution kernel, that is, the convolution kernel can be used as a second sampling result, or the convolution kernel can be split into a plurality of second sampling results.
根据步长进行采样,指的是按照步长在卷积核中进行挑选,挑选所得到的结果即为第二采样结果;采样的起点是未被其他采样过程选择的权重值,而以不同的位置为起点所得到的第二采样结果不同,且不存在重叠。例如,卷积核分为9个权重值,步长为2时,首先以第一个权重值为起点进行步长为2的采样,得到由第1、3、5、7、9个权重值组成的第二采样结果,再以第二个权重值为起点进行步长为2的采样,得到由第2、4、6、8个权重值组成的第二采样结果,至此卷积核已被全部选择,因此采样结束,得到上述两个第二采样结果。Sampling according to the step size refers to the selection in the convolution kernel according to the step size, and the result obtained is the second sampling result; the starting point of the sampling is the weight value not selected by other sampling processes, and the different The second sampling results obtained by taking the position as the starting point are different and do not overlap. For example, the convolution kernel is divided into 9 weight values, and when the step size is 2, the first weight value is the starting point for sampling with step size 2, and the first, 3, 5, 7, and 9 weight values are obtained. The second sampling result is composed of the second sampling result, and then the second weight is the starting point for sampling with a step size of 2, and the second sampling result composed of the 2nd, 4th, 6th, and 8th weight values is obtained. So far, the convolution kernel has been All are selected, so the sampling ends, and the above two second sampling results are obtained.
由于对待处理数据的采样和对卷积核的采样均是根据步长进行的,因此第一采样结果的数量和第二采样结果的数量是相同的。由于卷积运算中,卷积核每移动一步进行一次计算,因此整个卷积运算过程中,待处理数据的子数据与卷积核的权重值之间具有对应关系,进而第一采样结果和第二采样结果也是有对应关系的,即至少一个第一采样结果和至少一个第二采样结果是一一对应的。相互对应的第一采样结果和第二采样结果,指的是待处理数据中的子数据,和卷积过程中与之进行计算的权重值。具体匹配时,可以以采样的起点位置进行匹配,也就是说,第一采样结果的起点在待处理数据中的相对 位置,与第二采样结果的起点在卷积核中的相对位置相同,则确认上述第一采样结果与第二采样结果相互对应,例如,上述提到的待处理数据的示例和卷积核的示例中,起点为第1个子数据的第一采样结果,与起点为第1个权重值的第二采样结果相互对应,即由第1、3、5、7、9、11、13、15、17个子数据组成的第一采样结果,与由第1、3、5、7、9个权重值组成的第二采样结果相对应;起点为第2个子数据的第一采样结果,与起点为第2个权重值的第二采样结果相互对应,即由第2、4、6、8、10、12、14、16个子数据组成的第一采样结果,与由第2、4、6、8个权重值组成的第二采样结果相对应。Since both the sampling of the data to be processed and the sampling of the convolution kernel are performed according to the step size, the number of the first sampling results and the number of the second sampling results are the same. In the convolution operation, the convolution kernel performs a calculation every time it moves, so in the whole convolution operation process, there is a correspondence between the sub-data of the data to be processed and the weight value of the convolution kernel, and then the first sampling result and the first sampling result have a corresponding relationship. The two sampling results also have a corresponding relationship, that is, at least one first sampling result and at least one second sampling result are in a one-to-one correspondence. The first sampling result and the second sampling result corresponding to each other refer to the sub-data in the data to be processed and the weight value calculated thereon in the convolution process. When matching, the matching can be performed based on the starting point position of the sampling, that is, the relative position of the starting point of the first sampling result in the data to be processed is the same as the relative position of the starting point of the second sampling result in the convolution kernel, then Confirm that the first sampling result and the second sampling result above correspond to each other. For example, in the example of the data to be processed and the example of the convolution kernel mentioned above, the starting point is the first sampling result of the first sub-data, and the starting point is the first sampling result of the first sub-data. The second sampling results of the weight values correspond to each other, that is, the first sampling results composed of the 1st, 3rd, 5th, 7th, 9th, 11th, 13th, 15th, and 17th sub-data, and the first sampling results composed of the 1st, 3rd, 5th, 7th sub-data , corresponding to the second sampling result consisting of 9 weight values; the starting point is the first sampling result of the second sub-data, which corresponds to the second sampling result whose starting point is the second weight value, that is, the second, fourth, sixth , 8, 10, 12, 14, and 16 sub-data consist of the first sampling result, which corresponds to the second sampling result consisting of the 2nd, 4th, 6th, and 8th weight values.
在步骤S103中,将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列,以使所述处理阵列输出处理结果。In step S103, the at least one first sampling result and the at least one second sampling result are correspondingly input to the processing array, so that the processing array outputs the processing result.
其中,将第一对相互对应的第一采样结果和第二采样结果输入处理阵列,再将第二对相互对应的第一采样结果和第二采样结果输入处理阵列,直至将最后一对相互对应的第一采样结果和第二采样结果输入处理阵列,从而可控制处理阵列输出处理结果,其中,处理结果指的是,待处理数据被卷积核卷积后的结果。Wherein, the first pair of the first sampling result and the second sampling result corresponding to each other are input into the processing array, and then the second pair of the first sampling result and the second sampling result corresponding to each other are input into the processing array, until the last pair is corresponding to each other The first sampling result and the second sampling result of , are input into the processing array, so that the processing array can be controlled to output the processing result, wherein the processing result refers to the result after the data to be processed is convolved by the convolution kernel.
需要注意的是,所述待处理数据为单通道数据或多通道数据中的一个通道的数据,所述卷积核为单通道卷积核或多通道卷积核中的一个通道的卷积核。也就是说,当卷积核和/或待处理数据为多通道时,各个通道是对应进行卷积的,该方法针对的是单个通道的卷积核对单个通道的待处理数据进行卷积的过程,而将本实施例提供的数据处理方法在各个通道中分别使用,并将各个处理结果进行结合,便可以得到多通道卷积的结果。It should be noted that the data to be processed is the data of one channel in the single-channel data or the multi-channel data, and the convolution kernel is the convolution kernel of one channel in the single-channel convolution kernel or the multi-channel convolution kernel. . That is to say, when the convolution kernel and/or the data to be processed are multi-channel, each channel is convolved correspondingly, and this method is aimed at the process of convolution of the convolution kernel of a single channel on the data to be processed of a single channel , and the data processing method provided in this embodiment is used in each channel respectively, and each processing result is combined to obtain a multi-channel convolution result.
根据上述实施例可知,通过对待处理数据和卷积核进行同步的采样,可以得到至少一个第一采样结果和至少一个第二采样结果,而且第一采样结果和第二采样结果间一一对应,进而可以依次将对应的第一采样结果和第二采样结果输入至处理阵列,以得到处理结果。由于对待处理数据和卷积核的采样都是基于卷积运算的步长进行的,因此对应的第一采样结果和第二采样结果之间相互匹配,即第二采样结果对于第一采样结果的卷积运算的步长为1,进而当输入至处理阵列后能够使每个单元均能够被利用,提高了处理阵列的利用率,避免了能耗浪费,而且提高了处理效率。According to the above embodiment, it can be known that at least one first sampling result and at least one second sampling result can be obtained by synchronously sampling the data to be processed and the convolution kernel, and there is a one-to-one correspondence between the first sampling result and the second sampling result, Further, the corresponding first sampling result and the second sampling result may be sequentially input to the processing array to obtain the processing result. Since the sampling of the data to be processed and the convolution kernel is performed based on the step size of the convolution operation, the corresponding first sampling result and the second sampling result match each other, that is, the difference between the second sampling result and the first sampling result is matched. The step size of the convolution operation is 1, so that each unit can be utilized after being input to the processing array, which improves the utilization rate of the processing array, avoids waste of energy consumption, and improves the processing efficiency.
具体来说,常用的卷积神经网络加速器,其处理阵列一般为二维连接架构,在单指令流多数据流(Single Instruction Multiple Data,SIMD)的操作模式下,单条指令控制所有单元进行相同的操作(如:移位、访存、乘加累积运算(Multiple Accumulate,MAC)等)。而当卷积运算的步长大于1时,处理阵列的部分单元计算的结果是不需要的,这将大大降低处理阵列的利用率。例如,当步长stride=2时,SIMD模式下处理阵列的利用率只有1/4,当步长stride=3时,SIMD模式下处理阵列的利用率只有1/9。通过本实施例提供的处理方法,将处理阵列的每次运算转化为第二采样结果对于第一采样结果的步长为1的卷积运算,因此使处理阵列的利用率达到百分之百。Specifically, the processing array of the commonly used convolutional neural network accelerator is generally a two-dimensional connection architecture. In the single instruction stream multiple data stream (Single Instruction Multiple Data, SIMD) operation mode, a single instruction controls all units to perform the same operation. Operations (such as: shift, memory access, multiply-accumulate operations (Multiple Accumulate, MAC), etc.). However, when the step size of the convolution operation is greater than 1, the calculation results of some units of the processing array are unnecessary, which will greatly reduce the utilization of the processing array. For example, when the step size stride=2, the utilization rate of the processing array in the SIMD mode is only 1/4, and when the step size stride=3, the utilization rate of the processing array in the SIMD mode is only 1/9. With the processing method provided in this embodiment, each operation of the processing array is converted into a convolution operation in which the second sampling result has a step size of 1 for the first sampling result, so that the utilization rate of the processing array reaches 100%.
本公开的一些实施例中,可以按照下述方式根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果:首先,按照所述步长对所述待处理数据进行行采样,得到至少一个第一行采样结果,其中,所述至少一个第一行采样结果的并集为所述待处理数据;接下来,按照所述步长对所述待处理数据进行列采样,得到至少一个第一列采样结果,其中,所述至少一个第一列采样结果的并集为所述待处理数据;最后,分别将每个所述第一行采样结果和每个所述第一列采样结果的交集,确定为第一采样结果。In some embodiments of the present disclosure, the data to be processed may be sampled according to the step size of the convolution operation in the following manner to obtain at least one first sampling result: first, the data to be processed is row-sampled according to the step size , obtain at least one sampling result of the first row, wherein the union of the at least one sampling result of the first row is the data to be processed; next, perform column sampling on the data to be processed according to the step size to obtain at least one sampling result of the first column, wherein the union of the at least one sampling result of the first column is the data to be processed; finally, each sampling result of the first row and the sampling result of each first column are respectively The intersection of the sampling results is determined as the first sampling result.
卷积运算时,卷积核在待处理数据上的移动分为两个方向,即行方向和列方向两个方向,因此卷积核中的权重值与待处理数据中的子数据的对应关系,是分为行和列两个维度的。分别按照行和列进行采样后,再将采样结果进行两两组合(即,使每个第一行采样结果均与每个第一列采样结果进行组合)并取交集,得到多个第一采样结果,这样可以使第一采样结果与第二采样结果能够在行和列两个维度均对应。During the convolution operation, the movement of the convolution kernel on the data to be processed is divided into two directions, namely the row direction and the column direction, so the corresponding relationship between the weight value in the convolution kernel and the sub-data in the data to be processed, It is divided into two dimensions: row and column. After sampling according to the row and column respectively, the sampling results are combined in pairs (that is, each first row sampling result is combined with each first column sampling result), and the intersection is obtained to obtain multiple first samples. As a result, this enables the first sampling result and the second sampling result to correspond in both row and column dimensions.
另外,行采样和列采样均采用步骤S101中介绍的采样方式进行采样。由于从两个维度进行采样,因此若步长为S,则第一采样结果的数量为S 2In addition, row sampling and column sampling are both sampled using the sampling method introduced in step S101. Since sampling is performed from two dimensions, if the step size is S, the number of first sampling results is S 2 .
在一个示例中,待处理数据为如图2所示的17*17的数据块,而卷积运算的步长为2,因此按照上述行采样、列采样,最后取交集的方式,可以得到如图3所示的四个第一采样结果301、302、303和304,其中,第一采样结果301为第1、3、5、7、9、11、13、15和17行与第1、3、5、7、9、11、13、15和17列(9*9)的交集,第一采样结果302为第1、3、5、7、9、11、13、15和17行与第2、4、6、8、10、12、14和16列(9*8)的交集,第一采样结果303为第2、4、6、8、10、12、14和16行与第1、3、5、7、9、11、13、15和17列(8*9)的交集,第一采样结果304为第2、4、6、8、10、12、14和16行与第2、4、6、8、10、12、14和16列(8*8)的交集。In an example, the data to be processed is a 17*17 data block as shown in Figure 2, and the step size of the convolution operation is 2. Therefore, according to the above method of row sampling, column sampling, and finally taking the intersection, the following can be obtained: The four first sampling results 301 , 302 , 303 and 304 shown in FIG. 3 , wherein the first sampling result 301 is the first, third, 5, 7, 9, 11, 13, 15 and 17 rows and the first, The intersection of columns 3, 5, 7, 9, 11, 13, 15, and 17 (9*9), the first sampling result 302 is rows 1, 3, 5, 7, 9, 11, 13, 15, and 17 with The intersection of the 2, 4, 6, 8, 10, 12, 14 and 16 columns (9*8), the first sampling result 303 is the 2, 4, 6, 8, 10, 12, 14 and 16 rows and the The intersection of columns 1, 3, 5, 7, 9, 11, 13, 15 and 17 (8*9), the first sampling result 304 is rows 2, 4, 6, 8, 10, 12, 14 and 16 with The intersection of columns 2, 4, 6, 8, 10, 12, 14, and 16 (8*8).
本公开的一些实施例中,可以按照下述方式根据卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果:首先,按照所述步长对所述卷积核进行行采样,得到至少一个第二行采样结果,其中,所述至少一个第二行采样结果的并集为所述卷积核;接下来,按照所述步长对所述卷积核进行列采样,得到至少一个第二列采样结果,其中,所述至少一个第二列采样结果的并集为所述卷积核;最后,分别将每个所述第二行采样结果和每个所述第二列采样结果的交集,确定为第二采样结果。In some embodiments of the present disclosure, the convolution kernel may be sampled according to the step size of the convolution operation in the following manner to obtain at least one second sampling result. sampling to obtain at least one sampling result of the second row, wherein the union of the at least one sampling result of the second row is the convolution kernel; next, perform column sampling on the convolution kernel according to the step size, Obtain at least one sampling result of the second column, wherein the union of the at least one sampling result of the second column is the convolution kernel; The intersection of the column sampling results is determined as the second sampling result.
卷积运算时,卷积核在待处理数据上的移动分为两个方向,即行方向和列方向两个方向,因此卷积核中的权重值与待处理数据中的子数据的对应关系,是分为行和列两个维度的。分别按照行和列进行采样后,再将采样结果进行两两组合(即,使每个第二行采样结果均分别与每个第二列采样结果进行组合)并取交集,得到多个第二采样结果,这样可以使第一采样结果与第二采样结果能够在行和列两个维度均对应。During the convolution operation, the movement of the convolution kernel on the data to be processed is divided into two directions, namely the row direction and the column direction, so the corresponding relationship between the weight value in the convolution kernel and the sub-data in the data to be processed, It is divided into two dimensions: row and column. After sampling according to the row and column respectively, the sampling results are combined in pairs (that is, the sampling results of each second row are combined with the sampling results of each second column respectively), and the intersection is obtained to obtain a plurality of second sampling results. Sampling results, so that the first sampling result and the second sampling result can correspond in both the row and column dimensions.
另外,行采样和列采样均采用步骤S102中介绍的采样方式进行采样。由于从两个维度进行采样,因此若步长为S,则第二采样结果的数量为S 2In addition, row sampling and column sampling are both sampled using the sampling method introduced in step S102. Since sampling is performed from two dimensions, if the step size is S, the number of second sampling results is S 2 .
在一个示例中,卷积核为如图4所示的3*3的卷积核,而卷积运算的步长为2,因此按照上述行采样、列采样,最后取交集的方式,可以得到如图5所示的四个第二采样结果501、502、503和504,其中,第二采样结果501为第1和3行与第1和3列的交集(即图中的四个权重值A、C、G、I),第二采样结果502为第1和3行与第2列的交集(即图中的两个权重值B和H),第二采样结果503为第2行与第1和3列的交集(即图中的两个权重值D和F),第二采样结果504为第2行与第2列的交集(即图中的权重值E)。In an example, the convolution kernel is a 3*3 convolution kernel as shown in Figure 4, and the step size of the convolution operation is 2. Therefore, according to the above method of row sampling, column sampling, and finally taking the intersection, we can get Four second sampling results 501 , 502 , 503 and 504 are shown in FIG. 5 , wherein the second sampling result 501 is the intersection of rows 1 and 3 and columns 1 and 3 (that is, the four weight values in the figure A, C, G, I), the second sampling result 502 is the intersection of the first and third rows and the second column (that is, the two weight values B and H in the figure), and the second sampling result 503 is the second row and The intersection of the first and third columns (ie, the two weight values D and F in the figure), the second sampling result 504 is the intersection of the second row and the second column (ie, the weight value E in the figure).
图3所示的四个第一采样结果和图5所示的四个第二采样结果,均是按照步长2进行采样得到的,因此是可以一一对应的,具体的,第一采样结果301与第二采样结果501对应,第一采样结果302与第二采样结果502对应,第一采样结果303与第二采样结果503对应,第一采样结果304与第二采样结果504对应。The four first sampling results shown in FIG. 3 and the four second sampling results shown in FIG. 5 are all obtained by sampling according to step size 2, so they can be in one-to-one correspondence. Specifically, the first sampling results 301 corresponds to the second sampling result 501 , the first sampling result 302 corresponds to the second sampling result 502 , the first sampling result 303 corresponds to the second sampling result 503 , and the first sampling result 304 corresponds to the second sampling result 504 .
本公开的一些实施例中,可以按照如图6所示的方式将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列,以使所述处理阵列输出处理结果,包括步骤S601至步骤S603。In some embodiments of the present disclosure, the at least one first sampling result and the at least one second sampling result may be correspondingly input to the processing array as shown in FIG. 6 , so that the processing array outputs The processing result includes steps S601 to S603.
在步骤S601中,针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,并将与该第一采样结果对应的第二采样结果输入至所述处理阵列。In step S601, for each first sampling result, the first sampling result is input into the processing array, and the second sampling result corresponding to the first sampling result is input into the processing array.
在步骤S602中,控制所述处理阵列根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果。In step S602, the processing array is controlled to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result.
在步骤S603中,控制所述处理阵列根据每个第一采样结果分别对应的子处理结果, 输出处理结果。In step S603, the processing array is controlled to output a processing result according to the sub-processing result corresponding to each first sampling result.
其中,步骤S601和步骤S602均为重复步骤(即重复N次,其中,N为第一采样结果的数量),也就是针对每个第一采样结果及其对应的第二采样结果均执行步骤S601和步骤S602。具体来说,先将第一个第一采样结果输入处理阵列,再将第一个第二采样结果输入处理阵列,并控制处理阵列根据上述输入得出第一个子处理结果;然后将第二个第一采样结果输入处理阵列,再将第二个第二采样结果输入处理阵列,并控制处理阵列根据上述输入得出第二个子处理结果;直至得出最后一个子处理结果(即第N个子处理结果)。例如,针对图3所示的四个第一采样结果和图5所示的四个第二采样结果执行步骤S601和步骤S602时,可以先将第一采样结果301输入至处理阵列,再将第二采样结果501输入处理阵列,从而控制处理阵列根据第一采样结果301和第二采样结果501得出第一个子处理结果;然后将第一采样结果302输入至处理阵列,再将第二采样结果502输入处理阵列,从而控制处理阵列根据第一采样结果302和第二采样结果502得出第二个子处理结果;然后将第一采样结果303输入至处理阵列,再将第二采样结果503输入处理阵列,从而控制处理阵列根据第一采样结果303和第二采样结果503得出第三个子处理结果;最后将第一采样结果304输入至处理阵列,再将第二采样结果504输入处理阵列,从而控制处理阵列根据第一采样结果304和第二采样结果504得出第四个子处理结果。Wherein, step S601 and step S602 are both repeated steps (that is, repeated N times, where N is the number of first sampling results), that is, step S601 is executed for each first sampling result and its corresponding second sampling result and step S602. Specifically, the first first sampling result is input into the processing array, and then the first second sampling result is input into the processing array, and the processing array is controlled to obtain the first sub-processing result according to the above input; then the second Input the first sampling result into the processing array, then input the second second sampling result into the processing array, and control the processing array to obtain the second sub-processing result according to the above input; until the last sub-processing result (that is, the Nth sub-processing result) is obtained process result). For example, when steps S601 and S602 are performed for the four first sampling results shown in FIG. 3 and the four second sampling results shown in FIG. 5 , the first sampling results 301 may be input into the processing array first, and then the The second sampling result 501 is input into the processing array, thereby controlling the processing array to obtain the first sub-processing result according to the first sampling result 301 and the second sampling result 501; then the first sampling result 302 is input into the processing array, and the second sampling The result 502 is input into the processing array, thereby controlling the processing array to obtain a second sub-processing result according to the first sampling result 302 and the second sampling result 502; then the first sampling result 303 is input into the processing array, and the second sampling result 503 is input processing array, so as to control the processing array to obtain the third sub-processing result according to the first sampling result 303 and the second sampling result 503; finally input the first sampling result 304 to the processing array, and then input the second sampling result 504 to the processing array, Thus, the control processing array obtains the fourth sub-processing result according to the first sampling result 304 and the second sampling result 504 .
其中,步骤S601可以按照下述方式执行:针对每个第一采样结果,将该第一采样结果的多个数值输入至所述处理阵列的多个单元中,使得所述多个数值在所述多个单元中的相对位置与所述多个数值在该第一采样结果中的相对位置相同。第一采样结果包括多行多列的数值,处理阵列包括多行多列的单元,每个单元用于存储一个数值。数值在第一采样结果中的排列方式与数值在处理阵列中的排列方式完全一致,形象地说,处理阵列是一层多行多列的单元,而第一采样结果是一层多行多列的数值,而且多行多列的单元与多行多列的数值相互平行且一一对应,第一采样结果输入时,将多行多列的数值整体映射至多行多列的单元内。Wherein, step S601 may be performed in the following manner: for each first sampling result, multiple values of the first sampling result are input into multiple units of the processing array, so that the multiple values are in the The relative positions of the plurality of units are the same as the relative positions of the plurality of numerical values in the first sampling result. The first sampling result includes multiple rows and multiple columns of numerical values, and the processing array includes multiple rows and multiple columns of units, each unit being used to store a numerical value. The arrangement of the values in the first sampling result is exactly the same as the arrangement of the values in the processing array. To put it figuratively, the processing array is a unit with multiple rows and columns in one layer, and the first sampling result is a layer with multiple rows and columns. The values of the multi-row and multi-column units and the multi-row and multi-column values are parallel to each other and correspond one-to-one. When the first sampling result is input, the multi-row and multi-column values are overall mapped to the multi-row and multi-column units.
请参照附图7,处理阵列可以包括有效阵列、分布在所述有效阵列周围的至少一个溢出行和至少一个溢出列。其中,所述有效阵列包括多个用于存储和处理数据的第一单元(即图7中圆形的单元执行阵列(Processing Engine,PE)),所述溢出行和所述溢出列包括多个用于存储数据的第二单元PE(即图7中六边形的单元PE)。第一单元的行数大于第一采样结果的行数、或等于第一采样结果的行数、或比第一采样结果的行数小1,第一单元的列数大于第一采样结果的列数、或等于第一采样结果的列数、或比第一采样结果的列数小1,上述关系用于确定待处理数据(将在下文进行详细描述)和第一采样结果(已在前文进行了详细描述)。例如,图3所示的四个第一采样结果输入图7所示的处理阵列(包括10*10个单元,中心的8*8个单元为第一单元,中心的8*8个单元周围,上下各具有两行第二单元,左右各具有两列第二单元)中,四个第一采样结果的行数和列数均为8或9,满足上述关系。Referring to FIG. 7, the processing array may include an active array, at least one overflow row and at least one overflow column distributed around the active array. Wherein, the effective array includes a plurality of first units for storing and processing data (ie, the circular unit execution array (Processing Engine, PE) in FIG. 7 ), and the overflow row and the overflow column include a plurality of The second unit PE for storing data (ie, the hexagonal unit PE in FIG. 7 ). The number of rows of the first unit is greater than the number of rows of the first sampling result, or equal to the number of rows of the first sampling result, or 1 less than the number of rows of the first sampling result, and the number of columns of the first unit is greater than the number of columns of the first sampling result or equal to the number of columns of the first sampling result, or smaller than the number of columns of the first sampling result by 1, the above relationship is used to determine the data to be processed (which will be described in detail below) and the first sampling result (which has been carried out in the previous section). detailed description). For example, the four first sampling results shown in Figure 3 are input into the processing array shown in Figure 7 (including 10*10 cells, the 8*8 cells in the center are the first cells, and the 8*8 cells in the center are around, There are two rows of second cells at the top and bottom, and two columns of second cells at the left and right), the number of rows and columns of the four first sampling results are both 8 or 9, which satisfies the above relationship.
其中,第一单元及相邻单元间的连接关系如图8所示,从图中可以看到,第一单元内具有内部寄存器R0、运算单元(Arithmetic and Logic Unit,ALU)以及相关的数据加载和存储电路模块M,而且每个第一单元内均连接一个移位寄存器堆和静态随机存取存储器(Static Random-Access Memory,SRAM),移位寄存器堆内具有R1、R2、R3、R4等多个移位寄存器。而第二单元,相较于第一单元,其他结构相同,但不具有运算单元ALU。相邻的单元间通过移位寄存器堆实现连接,在处理阵列中,每个单元与其各个方向(即上、下、左和右)的相邻单元均连接。Among them, the connection relationship between the first unit and adjacent units is shown in Figure 8. As can be seen from the figure, the first unit has an internal register R0, an arithmetic unit (Arithmetic and Logic Unit, ALU) and related data loading. and storage circuit module M, and each first unit is connected with a shift register file and a static random-access memory (Static Random-Access Memory, SRAM), and the shift register file has R1, R2, R3, R4, etc. Multiple shift registers. The second unit, compared with the first unit, has the same other structure, but does not have an arithmetic unit ALU. Adjacent cells are connected through a shift register file, and in the processing array, each cell is connected to its neighbors in all directions (ie, up, down, left, and right).
基于上述处理阵列的结构,第一采样结果输入时,可以将该第一采样结果的多个数 值,输入至所述处理阵列的多个单元中,使得该第一采样结果中首行首列的数值输入至所述第一单元中位于首行首列的单元中,即将首个数值输入至首个第一单元中,由于第一采样结果在处理阵列的存储和移动是以整体为单位统一进行的(即单指令流多数据流(Single Instruction Multiple Data,SIMD)的操作模式),因此首个数值定位后,便实现了整个第一采样结果与处理阵列的定位。第一采样结果的行数,可以小于或等于第一单元的行数,或比第一单元的行数大1,因此最多有一行数值存储在第二单元中,而这种情况下,多出的一行数值是不需要被第二采样结果卷积的,因此既保证了第一采样结果与第二采样结果间的卷积运算,又避免了浪费能耗浪费和效率降低;同理,第一采样结果的列数,可以小于或等于第一单元的列数,或比第一单元的列数大1,因此最多有一列数值存储在第二单元中,而这种情况下,多出的一列数值是不需要被第二采样结果卷积的,因此既保证了第一采样结果与第二采样结果间的卷积运算,又避免了浪费能耗浪费和效率降低。Based on the above structure of the processing array, when the first sampling result is input, multiple values of the first sampling result may be input into multiple units of the processing array, so that the first row and first column of the first sampling result are The value is input into the unit located in the first row and the first column in the first unit, that is, the first value is input into the first first unit, since the storage and movement of the first sampling result in the processing array are performed in a unified unit as a whole (that is, the operation mode of Single Instruction Multiple Data (SIMD)), so after the first numerical value is located, the positioning of the entire first sampling result and the processing array is realized. The number of rows of the first sampling result can be less than or equal to the number of rows of the first unit, or 1 greater than the number of rows of the first unit, so at most one row of values is stored in the second unit, and in this case, more than The value of a row of , does not need to be convolved by the second sampling result, so it not only ensures the convolution operation between the first sampling result and the second sampling result, but also avoids waste of energy consumption and efficiency reduction; The number of columns of the sampling result can be less than or equal to the number of columns in the first unit, or 1 greater than the number of columns in the first unit, so at most one column of values is stored in the second unit, and in this case, the extra column The value does not need to be convolved by the second sampling result, so the convolution operation between the first sampling result and the second sampling result is ensured, and waste of energy consumption and efficiency reduction are avoided.
其中,步骤S602可以按照如图9所示的方式执行,包括步骤S901至步骤S903。Wherein, step S602 may be performed as shown in FIG. 9 , including steps S901 to S903.
在步骤S901中,针对对应的第二采样结果中的每个权重值,控制所述处理阵列采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和。In step S901 , for each weight value in the corresponding second sampling result, the processing array is controlled to use a value corresponding to the weight value in the first sampling result, and determine a partial sum with the weight value.
在步骤S902中,控制所述处理阵列根据该对应的第二采样结果中各权重值分别对应的部分和确定部分结果。In step S902, the processing array is controlled to determine the partial result and the partial result corresponding to each weight value in the corresponding second sampling result respectively.
在步骤S903中,控制所述处理阵列根据至少一个部分结果,确定该第一采样结果对应的子处理结果。In step S903, the processing array is controlled to determine a sub-processing result corresponding to the first sampling result according to at least one partial result.
其中,步骤S901为重复步骤(即重复M次,其中,M为该对应的第二采样结果中权重值的数量),也就是针对第二采样结果的每个权重值均执行步骤S901,从而可以依次得到每个权重值对应的部分和(即第1个至第M个部分和)。确定部分和时,可以将权重值与对应的数值相乘得到。Wherein, step S901 is a repeated step (that is, repeated M times, where M is the number of weight values in the corresponding second sampling result), that is, step S901 is performed for each weight value of the second sampling result, so that it is possible to The partial sums corresponding to each weight value (ie the 1st to Mth partial sums) are obtained in turn. When determining the partial sum, the weight value can be multiplied by the corresponding value.
其中,步骤S901可以按照下述执行:针对对应的第二采样结果中的首个权重值,控制所述处理阵列采用该第一采样结果在所述处理阵列的初始位置对应单元中的数值与该首个权重值确定部分和;针对对应的第二采样结果中的每个非首个权重值,根据该第一采样结果中与该非首个权重值对应的第一数值,以及该第一采样结果中与该非首个权重值的上一个权重值对应的第二数值在该第一采样结果中的位置关系,确定所述第一采样结果的移动方式,并控制所述处理阵列采用确定的移动方式将所述第二数值移动至对应单元,以及控制所述处理阵列采用移动后的对应单元中的数值与该非首个权重值确定部分和。例如,第j个非首个权重值在第j-1个非首个权重值右侧,则将第一采样结果相对于处理阵列向左移动一个单元。由于第一采样结果的移动是以整体为单位进行的(即单指令流多数据流(Single Instruction Multiple Data,SIMD)的操作模式),因此移动时每个单元根据移动方向将其存储的数值发送至相邻的单元,例如,第一采样结果相对于处理阵列向左移动一个单元,则每个单元均将存储的数值发送至其左侧相邻的单元。Wherein, step S901 may be performed as follows: for the first weight value in the corresponding second sampling result, control the processing array to use the value of the first sampling result in the unit corresponding to the initial position of the processing array and the The first weight value determines the partial sum; for each non-first weight value in the corresponding second sampling result, according to the first value corresponding to the non-first weight value in the first sampling result, and the first sampling result The positional relationship of the second value corresponding to the last weight value of the non-first weight value in the result in the first sampling result, determining the movement mode of the first sampling result, and controlling the processing array to use the determined The moving method moves the second numerical value to the corresponding unit, and controls the processing array to use the numerical value in the moved corresponding unit and the non-first weight value to determine the partial sum. For example, if the jth non-first weight value is to the right of the j-1th non-first weight value, the first sampling result is shifted to the left by one unit relative to the processing array. Since the movement of the first sampling result is performed in the unit of the whole (that is, the operation mode of Single Instruction Multiple Data (SIMD)), when moving, each unit sends its stored value according to the moving direction. To adjacent cells, for example, if the first sampled result is shifted one cell to the left relative to the processing array, each cell sends the stored value to the cell adjacent to its left.
其中,步骤S902中,根据各个部分和得出部分结果时,可以将各个部分和进行求和,得出部分结果。执行时,单元第一次得出部分和后将其存储,单元之后每次得出部分和后均与存储的部分和求和,并将求和结果作为新的部分和进行存储,最终得到部分结果。Wherein, in step S902, when a partial result is obtained according to each partial sum, each partial sum may be summed to obtain a partial result. When executing, the unit obtains the partial sum for the first time and stores it. After each time the unit obtains the partial sum, it sums the stored partial sum, and stores the summation result as a new partial sum, and finally obtains the partial sum. result.
其中,步骤S903中,可以将用于存储和处理数据的单元所得出的部分结果,按照各个单元的位置关系进行对应排列,得出多行多列的子处理结果。Wherein, in step S903, the partial results obtained by the unit for storing and processing data may be correspondingly arranged according to the positional relationship of each unit, so as to obtain sub-processing results of multiple rows and multiple columns.
下面以图3所示的四个第一采样结果和图5所示的四个第二采样结果为例,进一步 详细说明求解子处理结果的过程。The following takes the four first sampling results shown in Fig. 3 and the four second sampling results shown in Fig. 5 as examples to further describe the process of solving the sub-processing results in detail.
第一采样结果301和第二采样结果501:首先确定权重值A为首个权重值,保持第一采样结果输入至处理阵列时的初始位置,即(1,1)存储在首行首列个第一单元中,(1,3)存储在首行第二列个第一单元中,(3,1)存储在第二行首列个第一单元中,(3,3)存储在第二行第二列个第一单元中,第9行数值(即最后一行数值)存储在第一单元组成的8*8阵列的下侧第一行第二单元中,第9列数值(即最后一列数值)存储在第一单元组成的8*8阵列的右侧第一列第二单元中;然后每个第一单元均将存储的数据与权重值A相乘得出部分和,以首行首列的第一单元为例,得出部分和A*(1,1),并进行存储,以首行第二列的第一单元为例,得出部分和A*(1,3),并进行存储,以第二行首列的第一单元为例,得出部分和A*(3,1),并进行存储,以第二行第二列的第一单元为例,得出部分和A*(3,3),并进行存储。其他第一单元得出部分和的过程不再一一赘述。但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即卷积过程中第一采样结果301的最后一行只与第二采样结果501的权重值G和I相乘,而无需与权重值A和C相乘,卷积过程中第一采样结果301的最后一列只与第二采样结果501的权重值C和I相乘,而无需与权重值A和G相乘;然后针对非首个权重值C,由于权重值C在首个权重值A的右侧,因此请参照附图10,将第一采样结果相对于处理阵列整体向左移动一个单元,即每个单元的移位寄存器堆中的移位寄存器R1将其存储的数据发送至其左侧的单元的移位寄存器堆中的移位寄存器R1,即(1,3)存储在首行首列个第一单元中,(1,5)存储在首行第二列个第一单元中,(3,3)存储在第二行首列个第一单元中,(3,5)存储在第二行第二列个第一单元中,首列数值存储在第一单元组成的8*8阵列的左侧第一列第二单元中;然后每个第一单元均将存储的数据与权重值C相乘得出部分和,以首行首列的第一单元为例,得出部分和C*(1,3),并与原来存储的部分和A*(1,1)相加后,得到最新的部分和A*(1,1)+C*(1,3)进行存储,以首行第二列的第一单元为例,得出部分和C*(1,5),并与原来存储的部分和A*(1,3)相加后,得到最新的部分和A*(1,3)+C*(1,5)进行存储,以第二行首列的第一单元为例,得出部分和C*(3,3),并与原来存储的部分和A*(3,1)相加后,得到最新的部分和A*(3,1)+C*(3,3)进行存储,以第二行第二列的第一单元为例,得出部分和C*(3,5),并与原来存储的部分和A*(3,3)相加后,得到最新的部分和A*(3,3)+C*(3,5)进行存储,其他第一单元得出部分和的过程不再一一赘述,但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即卷积过程中第一采样结果301的首列只与第二采样结果501的权重值A和G相乘,而无需与权重值C和I相乘;然后针对非首个权重值I,由于权重值I在权重值C的下侧,因此请参照附图11,将第一采样结果相对于处理阵列整体向上移动一个单元,即每个单元的移位寄存器堆中的移位寄存器R1将其存储的数据发送至其上侧的单元的移位寄存器堆中的移位寄存器R1,即(3,3)存储在首行首列个第一单元中,(3,5)存储在首行第二列个第一单元中,(5,3)存储在第二行首列个第一单元中,(5,5)存储在第二行第二列个第一单元中,首行数值存储在第一单元组成的8*8阵列的上侧第一行第二单元中;然后每个第一单元均将存储的数据与权重值I相乘得出部分和,以首行首列的第一单元为例,得出部分和I*(3,3),并与原来存储的部分和A*(1,1)+C*(1,3)相加后,得到最新的部分和A*(1,1)+C*(1,3)+I*(3,3)进行存储,以首行第二列的第一单元为例,得出部分和I*(3,5),并与原来存储的部分和A*(1,3)+C*(1,5)相加后,得到最新的部分和A*(1,3)+C*(1,5)+I*(3,5)进行存储,以第二行首列的第一单元为例,得出部分和I*(5,3),并与原来存储的部分和A*(3,1)+C*(3,3)相加后,得到最新的部分和A*(3,1)+C*(3,3)+I*(5,3)进行存储,以第二行第二列的第一单元为例,得出部分和I*(5,5),并与原来存储的部分和 A*(3,3)+C*(3,5)相加后,得到最新的部分和A*(3,3)+C*(3,5)+I*(5,5)进行存储,其他第一单元得出部分和的过程不再一一赘述,但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即卷积过程中第一采样结果301的首行只与第二采样结果501的权重值A和C相乘,而无需与权重值G和I相乘;最后针对权重值G这个非首个权重值,由于权重值G在权重值I的左侧,因此请参照附图12,将第一采样结果相对于处理阵列整体向右移动一个单元,即每个单元的移位寄存器堆中的移位寄存器R1将其存储的数据发送至其右侧的单元的移位寄存器堆中的移位寄存器R1,即(3,1)存储在首行首列个第一单元中,(3,3)存储在首行第二列个第一单元中,(5,1)存储在第二行首列个第一单元中,(5,3)存储在第二行第二列个第一单元中,末列数值存储在第一单元组成的8*8阵列的右侧第一列第二单元中;然后每个第一单元均将存储的数据与权重值G相乘得出部分和,以首行首列的第一单元为例,得出部分和G*(3,1),并与原来存储的部分和A*(1,1)+C*(1,3)+I*(3,3)相加后,得到首行首列个第一单元的部分结果A*(1,1)+C*(1,3)+I*(3,3)+G*(3,1)进行存储,以首行第二列的第一单元为例,得出部分和G*(3,3),并与原来存储的部分和A*(1,3)+C*(1,5)+I*(3,5)相加后,得到最新的部分和A*(1,3)+C*(1,5)+I*(3,5)+G*(3,3)作为部分结果进行存储,以第二行首列的第一单元为例,得出部分和G*(5,1),并与原来存储的部分和A*(3,1)+C*(3,3)+I*(5,3)相加后,得到最新的部分和A*(3,1)+C*(3,3)+I*(5,3)+G*(5,1)作为部分结果进行存储,以第二行第二列的第一单元为例,得出部分和G*(5,3),并与原来存储的部分和A*(3,3)+C*(3,5)+I*(5,5)相加后,得到最新的部分和A*(3,3)+C*(3,5)+I*(5,5)+G*(5,3)作为部分结果进行存储,其他第一单元得出部分和的过程不再一一赘述,但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即,卷积过程中第一采样结果301的最后一列只与第二采样结果501的权重值C和I相乘,而无需与权重值A和G相乘。最后将全部第一单元的部分结果进行按各单元的位置关系排列,得到第一采样结果301对应的子处理结果。The first sampling result 301 and the second sampling result 501: first determine the weight value A as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (1, 1) is stored in the first row, first column, first In a cell, (1, 3) is stored in the first cell of the first row and second column, (3, 1) is stored in the first cell of the second row and first column, and (3, 3) is stored in the second row In the first unit of the second column, the value of the 9th row (that is, the value of the last row) is stored in the second unit of the first row of the lower side of the 8*8 array composed of the first unit, and the value of the 9th column (that is, the value of the last column) ) is stored in the second unit of the first column on the right side of the 8*8 array composed of the first unit; then each first unit multiplies the stored data with the weight value A to obtain a partial sum, with the first row and first column Take the first unit of the first row as an example, obtain the partial sum A*(1, 1), and store it, take the first unit of the first row and the second column as an example, obtain the partial sum A*(1, 3), and carry out Store, take the first unit of the second row and the first column as an example, get the partial sum A*(3, 1), and store it, take the first unit of the second row and the second column as an example, get the partial sum A *(3, 3), and store it. The process of obtaining the partial sum of the other first units will not be repeated one by one. However, it should be noted that the second unit does not perform operations, so the partial sum cannot be obtained. This is because the convolution kernel shown in Figure 4 is obtained by performing the convolution operation with a stride of 2 on the data shown in Figure 2. The result is an 8*8 data array, that is, the last row of the first sampling result 301 in the convolution process is only multiplied by the weight values G and I of the second sampling result 501, without multiplying the weight values A and C. In the product process, the last column of the first sampling result 301 is only multiplied by the weight values C and I of the second sampling result 501, without multiplying the weight values A and G; then for the non-first weight value C, due to the weight value C is on the right side of the first weight value A, so please refer to FIG. 10 , move the first sampling result to the left by one unit relative to the entire processing array, that is, the shift register R1 in the shift register file of each unit will The data stored in it is sent to the shift register R1 in the shift register file of the unit on its left, that is, (1, 3) is stored in the first unit of the first row and first column, and (1, 5) is stored in the first row In the first cell in the second column, (3, 3) is stored in the first cell in the first column of the second row, (3, 5) is stored in the first cell in the second row and second column, and the value in the first column is stored In the second unit of the first column on the left side of the 8*8 array composed of the first unit; then each first unit multiplies the stored data by the weight value C to obtain a partial sum, with the first row and the first column of the first unit. Take one unit as an example, get the partial sum C*(1, 3), and add it to the original stored partial sum A*(1, 1) to get the latest partial sum A*(1, 1)+C* (1, 3) for storage, taking the first unit of the first row and second column as an example, obtain the partial sum C*(1, 5), and add it to the original stored partial sum A*(1, 3) , get the latest partial sum A*(1, 3)+C*(1, 5) for storage, take the first unit of the second row and first column as an example, get the partial sum C*(3, 3), and After adding the original stored part sum A*(3, 1), get the latest part sum A*(3, 1)+C*(3, 3) and store it with the first part in the second row and the second column. Take the unit as an example, get the partial sum C*(3, 5), and add it to the original stored partial sum A*(3, 3) to get the latest partial sum A*(3, 3)+C*( 3, 5) for storage, and the process of obtaining the partial sum of the other first units will not be repeated one by one, but it should be noted that the second unit does not perform operations, so the partial sum is not obtained. This is because as shown in Figure 4 The convolution kernel of the convolution kernel performs the convolution operation with a step size of 2 on the data shown in Figure 2, and the result is an 8*8 data array, that is, the first column of the first sampling result 301 in the convolution process is only related to the second sampling. The weight values A and G of the result 501 are multiplied without multiplying the weight values C and I; then for the non-first weight value I, since the weight value I is on the lower side of the weight value C, please refer to FIG. 11 , Move the first sampling result upward by one unit relative to the processing array as a whole, that is, the shift register R1 in the shift register file of each unit sends its stored data to the shift register file of the unit above it. Shift register R1, that is, (3, 3) is stored in the first cell of the first row and first column, (3, 5) is stored in the first cell of the first row and second column, and (5, 3) is stored in the second In the first cell of the first column of the row, (5, 5) is stored in the first cell of the second row and the second column, and the value of the first row is stored in the upper side of the 8*8 array composed of the first cell. unit; then each first unit multiplies the stored data with the weight value I to obtain a partial sum, taking the first unit in the first row and first column as an example, obtains the partial sum I*(3, 3), and After adding the original stored partial sum A*(1,1)+C*(1,3), get the latest partial sum A*(1,1)+C*(1,3)+I*(3 , 3) To store, take the first unit of the first row and the second column as an example, obtain the partial sum I*(3, 5), and compare it with the original stored partial sum A*(1, 3)+C*(1 , 5) After adding, get the latest part and A*(1,3)+C*(1,5)+I*(3,5) for storage, take the first unit of the second row and first column as an example , get the partial sum I*(5, 3), and add it to the original stored partial sum A*(3, 1)+C*(3, 3) to get the latest partial sum A*(3, 1 )+C*(3,3)+I*(5,3) for storage, taking the first unit of the second row and second column as an example, obtain the partial sum I*(5,5), and store it with the original After adding the part and A*(3,3)+C*(3,5), get the latest part and A*(3,3)+C*(3,5)+I*(5,5) For storage, the process of obtaining partial sums by other first units will not be repeated, but it should be noted that the second unit does not perform operations, so it does not obtain partial sums, because the convolution kernel shown in Figure 4 The result of convolution operation on the data shown in FIG. 2 is 8*8, that is, the first row of the first sampling result 301 in the convolution process only has the weight of the second sampling result 501 The values A and C are multiplied without multiplying the weight values G and I; finally, for the weight value G, which is not the first weight value, since the weight value G is on the left side of the weight value I, please refer to FIG. The first sampling result is shifted to the right by one unit relative to the entire processing array, that is, the shift register R1 in the shift register file of each unit sends its stored data to the shift register in the shift register file of the unit to the right of it. Bit register R1, that is, (3, 1) is stored in the first cell of the first row and first column, (3, 3) is stored in the first cell of the first row and second column, and (5, 1) is stored in the second row In the first unit of the first column, (5, 3) is stored in the first unit of the second row and the second column, and the last column value is stored in the second unit of the first column on the right side of the 8*8 array composed of the first unit. Then each first unit multiplies the stored data with the weight value G to obtain a partial sum, taking the first unit of the first row and first column as an example, obtains the partial sum G*(3, 1), and combines with After the original stored part is added with A*(1,1)+C*(1,3)+I*(3,3), the first unit of the first row and first column is obtained The partial result of A*(1,1)+C*(1,3)+I*(3,3)+G*(3,1) is stored, taking the first unit of the first row and the second column as an example, Obtain the partial sum G*(3, 3) and add it to the original stored partial sum A*(1, 3)+C*(1, 5)+I*(3, 5) to get the latest part and A*(1,3)+C*(1,5)+I*(3,5)+G*(3,3) are stored as partial results, taking the first cell of the second row and first column as an example , get the partial sum G*(5,1), and add it to the original stored partial sum A*(3,1)+C*(3,3)+I*(5,3), get the latest The partial sum A*(3,1)+C*(3,3)+I*(5,3)+G*(5,1) is stored as the partial result, in the first cell of the second row and second column For example, get the partial sum G*(5,3), and add it to the original stored partial sum A*(3,3)+C*(3,5)+I*(5,5), get The latest partial sum A*(3, 3)+C*(3, 5)+I*(5, 5)+G*(5, 3) is stored as the partial result, and the other first units obtain the partial sum The process will not be repeated one by one, but it should be noted that the second unit does not perform operations, so a partial sum is not obtained. This is because the convolution kernel shown in Figure 4 is based on the step size of the data shown in Figure 2. 2. The result obtained by the convolution operation is an 8*8 data array, that is, the last column of the first sampling result 301 in the convolution process is only multiplied by the weight values C and I of the second sampling result 501, without the need for the weights Values A and G are multiplied. Finally, the partial results of all the first units are arranged according to the positional relationship of each unit, and the sub-processing result corresponding to the first sampling result 301 is obtained.
第一采样结果302和第二采样结果502:首先确定权重值B为首个权重值,保持第一采样结果输入至处理阵列时的初始位置,即(1,2)存储在首行首列个第一单元中,(1,4)存储在首行第二列个第一单元中,(3,2)存储在第二行首列个第一单元中,(3,4)存储在第二行第二列个第一单元中,第9行数值(即最后一行数值)存储在第一单元组成的8*8阵列的下侧第一行第二单元中;然后每个第一单元均将存储的数据与权重值B相乘得出部分和,以首行首列的第一单元为例,得出部分和B*(1,2),并进行存储,以首行第二列的第一单元为例,得出部分和B*(1,4),并进行存储,以第二行首列的第一单元为例,得出部分和B*(3,2),并进行存储,以第二行第二列的第一单元为例,得出部分和B*(3,4),并进行存储,其他第一单元得出部分和的过程不再一一赘述,但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即卷积过程中第一采样结果302的最后一行只与第二采样结果502的权重值H相乘,而无需与权重值B相乘;然后针对非首个权重值H,由于权重值H在权重值B的下侧,因此将第一采样结果相对于处理阵列整体向上移动一个单元,即每个单元的移位寄存器堆中的移位寄存器R1将其存储的数据发送至其上侧的单元的移位寄存器堆中的移位寄存器R1,即(3,2)存储在首行首列个第一单元中,(3,4)存储在首行第二列个第一单元中,(5,2)存储在第二行首列个第一单元中,(5,4)存储在第二行第二列个第一单元中,首行数值存储在第一单元组成的8*8阵列的上侧第一行第二单元中;然后每个第一单元均将存储的数据与权重值H相乘得出部分和,以首行首列的第一单元为例,得出部分和H*(3,2),并与原来存储的部分和B*(1,2)相加后,得到首行首列个第一单 元的部分结果B*(1,2)+H*(3,2)进行存储,以首行第二列的第一单元为例,得出部分和H*(3,4),并与原来存储的部分和B*(1,4)相加后,得到最新的部分和B*(1,4)+H*(3,4)作为部分结果进行存储,以第二行首列的第一单元为例,得出部分和H*(5,2),并与原来存储的部分和B*(3,2)相加后,得到最新的部分和B*(3,2)+H*(5,2)作为部分结果进行存储,以第二行第二列的第一单元为例,得出部分和H*(5,4),并与原来存储的部分和B*(3,4)相加后,得到最新的部分和B*(3,4)+H*(5,4)作为部分结果进行存储,其他第一单元得出部分和的过程不再一一赘述,但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即卷积过程中第一采样结果302的首行只与第二采样结果502的权重值B相乘,而无需与权重值H相乘;最后将全部第一单元的部分结果进行按各单元的位置关系排列,得到第一采样结果302对应的子处理结果。The first sampling result 302 and the second sampling result 502: first determine the weight value B as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (1, 2) is stored in the first row and first column. In a cell, (1, 4) is stored in the first cell in the first row and second column, (3, 2) is stored in the first cell in the second row and first column, and (3, 4) is stored in the second row In the first cell of the second column, the value of the ninth row (ie the last row of values) is stored in the second cell of the first row on the lower side of the 8*8 array formed by the first cell; then each first cell will store The partial sum is obtained by multiplying the data by the weight value B. Taking the first unit of the first row and the first column as an example, the partial sum B*(1, 2) is obtained and stored. Take the unit as an example, get the partial sum B*(1, 4), and store it, take the first unit in the first column of the second row as an example, get the partial sum B*(3, 2), and store it as The first unit in the second row and second column is taken as an example, and the partial sum B*(3, 4) is obtained and stored. The process of obtaining the partial sum of the other first units will not be repeated one by one, but it should be noted that , the second unit does not perform operations, so the partial sum is not obtained. This is because the convolution kernel shown in Figure 4 performs the convolution operation with a stride of 2 on the data shown in Figure 2. The result is 8*8 , that is, the last row of the first sampling result 302 in the convolution process is only multiplied by the weight value H of the second sampling result 502, without multiplying it by the weight value B; then for the non-first weight value H, because The weight value H is on the lower side of the weight value B, so the first sampling result is shifted up by one unit relative to the processing array as a whole, that is, the shift register R1 in the shift register file of each unit sends its stored data to it. The shift register R1 in the shift register file of the upper unit, that is, (3, 2) is stored in the first unit of the first row and the first column, and (3, 4) is stored in the first unit of the first row and the second column , (5, 2) is stored in the first cell of the second row and first column, (5, 4) is stored in the second row and second column of the first cell, and the value of the first row is stored in the first cell consisting of 8 *8 in the second unit of the first row on the upper side of the array; then each first unit multiplies the stored data by the weight value H to obtain a partial sum. Taking the first unit of the first row and first column as an example, we get After adding the partial sum H*(3, 2) and the original stored partial sum B*(1, 2), the partial result B*(1, 2)+H* of the first row, first column and first unit is obtained (3, 2) is stored, taking the first unit of the first row and the second column as an example, the partial sum H*(3, 4) is obtained, and it is added to the original stored partial sum B*(1, 4). , get the latest partial sum B*(1, 4)+H*(3, 4) and store it as a partial result, take the first unit of the second row and first column as an example, get the partial sum H*(5, 2 ), and add it to the original stored partial sum B*(3, 2) to get the latest partial sum B*(3, 2)+H*(5, 2) and store it as the partial result, with the second line Take the first cell of the second column as an example, and get the part and H*(5, 4), and add the original stored partial sum B*(3, 4) to get the latest partial sum B*(3, 4)+H*(5, 4) as the partial result For storage, the process of obtaining partial sums by other first units will not be repeated, but it should be noted that the second unit does not perform operations, so it does not obtain partial sums, because the convolution kernel shown in Figure 4 The result of performing convolution operation on the data shown in FIG. 2 with a stride of 2 is an 8*8 data array, that is, the first row of the first sampling result 302 in the convolution process only has the weight of the second sampling result 502 The value B is multiplied without multiplying the weight value H; finally, the partial results of all the first units are arranged according to the positional relationship of each unit, and the sub-processing result corresponding to the first sampling result 302 is obtained.
第一采样结果303和第二采样结果503:首先确定权重值D为首个权重值,保持第一采样结果输入至处理阵列时的初始位置,即(2,1)存储在首行首列个第一单元中,(2,3)存储在首行第二列个第一单元中,(4,1)存储在第二行首列个第一单元中,(4,3)存储在第二行第二列个第一单元中,第9列数值(即最后一列数值)存储在第一单元组成的8*8阵列的右侧第一列第二单元中;然后每个第一单元均将存储的数据与权重值D相乘得出部分和,以首行首列的第一单元为例,得出部分和D*(2,1),并进行存储,以首行第二列的第一单元为例,得出部分和D*(2,3),并进行存储,以第二行首列的第一单元为例,得出部分和D*(4,1),并进行存储,以第二行第二列的第一单元为例,得出部分和D*(4,3),并进行存储,其他第一单元得出部分和的过程不再一一赘述,但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即卷积过程中第一采样结果303的最后一列只与第二采样结果503的权重值F相乘,而无需与权重值D相乘;然后针对非首个权重值F,由于权重值F在权重值D的右侧,因此将第一采样结果相对于处理阵列整体向左移动一个单元,即每个单元的移位寄存器堆中的移位寄存器R1将其存储的数据发送至其左侧的单元的移位寄存器堆中的移位寄存器R1,即(2,3)存储在首行首列个第一单元中,(2,5)存储在首行第二列个第一单元中,(4,3)存储在第二行首列个第一单元中,(4,5)存储在第二行第二列个第一单元中,首列数值存储在第一单元组成的8*8阵列的左侧第一列第二单元中;然后每个单元均将存储的数据与权重值F相乘得出部分和,以首行首列的第一单元为例,得出部分和F*(2,3),并与原来存储的部分和D*(2,1)相加后,得到首行首列个第一单元的部分结果D*(2,1)+F*(2,3)进行存储,以首行第二列的第一单元为例,得出部分和F*(2,5),并与原来存储的部分和D*(2,3)相加后,得到最新的部分和D*(2,3)+F*(2,5)作为部分结果进行存储,以第二行首列的第一单元为例,得出部分和F*(4,3),并与原来存储的部分和D*(4,1)相加后,得到最新的部分和D*(4,1)+F*(4,3)作为部分结果进行存储,以第二行第二列的第一单元为例,得出部分和F*(4,5),并与原来存储的部分和D*(4,3)相加后,得到最新的部分和D*(4,3)+F*(4,5)作为部分结果进行存储,其他第一单元得出部分和的过程不再一一赘述,但需要注意的是,第二单元不进行运算,因此也不得出部分和,这是因为图4所示的卷积核在图2所示的数据上以步长为2进行卷积运算得到的结果为8*8的数据阵列,即卷积过程中第一采样结果303的首列只与第二采样结果503的权重值D相乘,而无需与权重值F相乘;最后将全部第一单元的部分结果进行按各单元的位置关系排列,得到第一采样结果303对应的子处理结果。The first sampling result 303 and the second sampling result 503: first determine the weight value D as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (2, 1) is stored in the first row and first column. In a cell, (2, 3) is stored in the first cell in the first row and second column, (4, 1) is stored in the first cell in the second row and the first column, and (4, 3) is stored in the second row In the second column and the first cell, the value of the ninth column (that is, the last column value) is stored in the second cell of the first column on the right side of the 8*8 array composed of the first cell; then each first cell will store Multiply the data by the weight value D to obtain the partial sum, take the first unit of the first row and the first column as an example, obtain the partial sum D*(2, 1), and store it. Take the unit as an example, obtain the partial sum D*(2, 3), and store it. Taking the first unit in the first column of the second row as an example, obtain the partial sum D*(4, 1), and store it as The first unit in the second row and second column is taken as an example, and the partial sum D*(4, 3) is obtained and stored. The process of obtaining the partial sum of the other first units will not be repeated one by one, but it should be noted that , the second unit does not perform operations, so the partial sum is not obtained. This is because the convolution kernel shown in Figure 4 performs the convolution operation with a stride of 2 on the data shown in Figure 2. The result is 8*8 The data array of , that is, the last column of the first sampling result 303 in the convolution process is only multiplied by the weight value F of the second sampling result 503, and does not need to be multiplied by the weight value D; then for the non-first weight value F, because The weight value F is to the right of the weight value D, so the first sampling result is shifted to the left by one unit relative to the entire processing array, that is, the shift register R1 in the shift register file of each unit sends its stored data to The shift register R1 in the shift register file of the unit on the left side, that is, (2, 3) is stored in the first unit of the first row and the first column, and (2, 5) is stored in the first row and the second column of the first unit. In the unit, (4, 3) is stored in the first unit of the first column of the second row, (4, 5) is stored in the first unit of the second row and second column, and the value of the first column is stored in the first unit. In the second unit of the first column on the left side of the 8*8 array; then each unit multiplies the stored data by the weight value F to obtain a partial sum. Taking the first unit of the first row and first column as an example, the partial sum is obtained. and F*(2, 3), and after adding the original stored part and D*(2, 1), the partial result of the first row, first column, first unit, and first unit is obtained D*(2, 1)+F*( 2, 3) For storage, take the first unit of the first row and the second column as an example, get the partial sum F*(2,5), and add it to the original stored partial sum D*(2,3), Get the latest partial sum D*(2, 3)+F*(2, 5) and store it as a partial result. Taking the first unit of the second row and first column as an example, get the partial sum F*(4, 3) , and add it to the original stored partial sum D*(4, 1) to get the latest partial sum D*(4, 1)+F*(4, 3) and store it as the partial result. Take the first cell of the second column as an example, get the partial sum F* (4, 5), and add it to the original stored partial sum D*(4, 3) to get the latest partial sum D*(4, 3)+F*(4, 5) and store it as the partial result, The process of obtaining the partial sum of the other first units will not be repeated, but it should be noted that the second unit does not perform operations, so the partial sum is not obtained. This is because the convolution kernel shown in Figure 4 is in Figure 2 On the data shown, the result of convolution operation with a step size of 2 is an 8*8 data array, that is, the first column of the first sampling result 303 in the convolution process is only the same as the weight value D of the second sampling result 503. multiplied by the weight value F without multiplication; finally, the partial results of all the first units are arranged according to the positional relationship of each unit, and the sub-processing result corresponding to the first sampling result 303 is obtained.
第一采样结果304和第二采样结果504:首先确定权重值E为首个权重值,保持第一采样结果输入至处理阵列时的初始位置,即(2,2)存储在首行首列个第一单元中,(2,4)存储在首行第二列个第一单元中,(4,2)存储在第二行首列个第一单元中,(4,4)存储在第二行第二列个第一单元中;然后每个单元均将存储的数据与权重值E 相乘得出部分和,以首行首列的第一单元为例,得出首行首列个第一单元的部分结果E*(2,2)进行存储,以首行第二列的第一单元为例,得出部分和E*(2,4)作为部分结果,并进行存储,以第二行首列的第一单元为例,得出部分和E*(4,2)作为部分结果,并进行存储,以第二行第二列的第一单元为例,得出部分和E*(4,4)作为部分结果,并进行存储,其他第一单元得出部分和的过程不再一一赘述;最后将全部第一单元的部分结果进行按各单元的位置关系排列,得到第一采样结果304对应的子处理结果。The first sampling result 304 and the second sampling result 504: first determine the weight value E as the first weight value, and keep the initial position when the first sampling result is input to the processing array, that is, (2, 2) is stored in the first row and first column. In a cell, (2, 4) is stored in the first cell in the first row and second column, (4, 2) is stored in the first cell in the second row and first column, and (4, 4) is stored in the second row In the second column and the first unit; then each unit multiplies the stored data with the weight value E to obtain a partial sum. Taking the first unit of the first row and first column as an example, the first row and first column are obtained. The partial result of the unit E*(2, 2) is stored. Taking the first unit of the first row and the second column as an example, the partial sum E*(2, 4) is obtained as the partial result, and stored, and the second row Taking the first unit of the first column as an example, the partial sum E*(4, 2) is obtained as a partial result and stored. Taking the first unit of the second row and the second column as an example, the partial sum E*(4 , 4) as a partial result, and store it, and the process of other first units to obtain the partial sum will not be repeated one by one; finally, the partial results of all the first units are arranged according to the positional relationship of each unit, and the first sampling result is obtained 304 corresponds to the sub-processing result.
其中,步骤S603可以按照下述方式执行:将多个子处理结果求和,得到处理结果。由于子处理结果是多行多列的部分结果,而且各个子处理结果的行数列数均相等(这是由于第一单元的行数列数相等),因此将对应位置的部分结果相加,并将各个位置所得的和作为处理结果,即将第一单元所得到的各个部分和进行求和得到该单元的值,每个单元的值构成处理结果。例如,上述图3和图5的示例中,首行首列个第一单元共得到四个部分结果,将这四个部分结果相加,得到处理结果中该单元对应位置的值,即A*(1,1)+C*(1,3)+I*(3,3)+G*(3,1)+B*(1,2)+H*(3,2)+D*(2,1)+F*(2,3)+E*(2,2),首行第二列个第一单元共得到四个部分结果,将这四个部分结果相加,得到处理结果中该单元对应位置的值,即A*(1,3)+C*(1,5)+I*(3,5)+G*(3,3)+B*(1,4)+H*(3,4)+D*(2,3)+F*(2,5)+E*(2,4);第二行首列个第一单元共得到四个部分结果,将这四个部分结果相加,得到处理结果中该单元对应位置的值,即A*(3,1)+C*(3,3)+I*(5,3)+G*(5,1)+B*(3,2)+H*(5,2)+D*(4,1)+F*(4,3)+E*(4,2);第二行第二列个第一单元共得到四个部分结果,将这四个部分结果相加,得到处理结果中该单元对应位置的值,即A*(3,3)+C*(3,5)+I*(5,5)+G*(5,3)+B*(3,4)+H*(5,4)+D*(4,3)+F*(4,5)+E*(4,4)。Wherein, step S603 may be performed in the following manner: summing up multiple sub-processing results to obtain a processing result. Since the sub-processing results are partial results with multiple rows and multiple columns, and the number of rows and columns of each sub-processing result is equal (this is because the number of rows and columns of the first unit is equal), the partial results of the corresponding positions are added, and the The sum obtained at each position is used as the processing result, that is, the sum of each partial sum obtained by the first unit is performed to obtain the value of the unit, and the value of each unit constitutes the processing result. For example, in the examples of Figures 3 and 5 above, four partial results are obtained in the first row, first column, and first cell, and the four partial results are added to obtain the value of the corresponding position of the cell in the processing result, that is, A* (1,1)+C*(1,3)+I*(3,3)+G*(3,1)+B*(1,2)+H*(3,2)+D*(2 , 1)+F*(2,3)+E*(2,2), a total of four partial results are obtained in the first row, second column, first unit, and these four partial results are added to obtain the processing result. The value of the corresponding position of the unit, namely A*(1,3)+C*(1,5)+I*(3,5)+G*(3,3)+B*(1,4)+H*( 3, 4)+D*(2, 3)+F*(2, 5)+E*(2, 4); a total of four partial results are obtained in the first unit of the second row, the first column, and the four partial results. The results are added to obtain the value of the corresponding position of the unit in the processing result, namely A*(3,1)+C*(3,3)+I*(5,3)+G*(5,1)+B* (3,2)+H*(5,2)+D*(4,1)+F*(4,3)+E*(4,2); the first unit of the second row, second column and the first unit is obtained in total Four partial results, add these four partial results to obtain the value of the corresponding position of the unit in the processing result, that is, A*(3,3)+C*(3,5)+I*(5,5)+ G*(5,3)+B*(3,4)+H*(5,4)+D*(4,3)+F*(4,5)+E*(4,4).
本公开的一些实施例中,待处理数据是根据图像的数据得到的,为了使待处理数据采样后的第一采样结果和处理阵列匹配,可以采用下述方式得到待处理数据:首先,根据所述处理阵列、所述卷积核以及所述步长,确定所述待处理数据的行数与列数;接下来,根据所述卷积核以及所述步长,确定重叠行数与重叠列数;最后,根据所述待处理数据的行数与列数、所述重叠行数以及所述重叠列数,对待处理图像的数据进行采样,得到多个待处理数据。In some embodiments of the present disclosure, the data to be processed is obtained according to the data of the image. In order to match the first sampling result after sampling the data to be processed with the processing array, the data to be processed can be obtained in the following manner: first, according to the The processing array, the convolution kernel and the step size are used to determine the number of rows and columns of the data to be processed; next, according to the convolution kernel and the step size, the number of overlapping rows and columns is determined Finally, according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, the data of the image to be processed is sampled to obtain a plurality of data to be processed.
其中,可以将卷积核的行数与步长的差作为重叠行数,将卷积核的列数与步长的差作为重叠列数。重叠行数和重叠列数也可以相等,可以采用下述方式确定重叠行数和重叠列数P:Wherein, the difference between the number of rows of the convolution kernel and the step size can be used as the number of overlapping rows, and the difference between the number of columns of the convolution kernel and the step size can be used as the number of overlapping columns. The number of overlapping rows and columns can also be equal, and the number of overlapping rows and columns P can be determined in the following ways:
P=K-S,P=K-S,
其中,K为卷积核的行数(行数和列数相等),S为卷积运算的步长。另外,K大于或等于S。Among them, K is the number of rows of the convolution kernel (the number of rows and columns are equal), and S is the step size of the convolution operation. In addition, K is greater than or equal to S.
其中,可以通过计算步长与第一单元的行数的乘积、与重叠行数的和值,确定待处理数据的行数。可以通过计算步长与第一单元的列数的乘积、与重叠列数的和值,确定待处理数据的列数。待处理数据的行数和列数可以相等,可以采用下述方式确定待处理数据的行数和列数L:The number of rows of the data to be processed may be determined by calculating the product of the step size and the number of rows of the first unit, and the sum of the number of overlapping rows. The number of columns of the data to be processed may be determined by calculating the product of the step size and the number of columns of the first unit and the sum of the number of overlapping columns. The number of rows and columns of the data to be processed can be equal, and the number of rows and columns L of the data to be processed can be determined in the following manner:
L=S*a+P,L=S*a+P,
其中,S为卷积运算的步长,a为第一单元的行数(行数和列数相等)。Among them, S is the step size of the convolution operation, and a is the number of rows of the first unit (the number of rows and columns are equal).
如图13所示,对待处理图像的数据进行采样时,将L*L的一个采样框,放置在待处理图像的数据的左上角位置,取采样框内的数据为第一个待处理数据,然后将采样框向右移动L-P,取采样框内的数据为第二个待处理数据,再向右移动L-P后再次采样,直至采样框无法向右移动L-P,然后再从取样框在左上角的位置向下移动L-P进行采样, 然后再重复第一行采样的过程,第二行采样结束后,继续向下移动L-P,直至无法继续向下移动L-P,且每次移动后,均对新的一行执行与第一行一致的采样。As shown in Figure 13, when sampling the data of the image to be processed, a sampling frame of L*L is placed in the upper left corner of the data of the image to be processed, and the data in the sampling frame is taken as the first data to be processed, Then move the sampling frame to the right by L-P, take the data in the sampling frame as the second data to be processed, move L-P to the right and then sample again until the sampling frame cannot be moved to the right by L-P, and then start from the sampling frame in the upper left corner. Move the position down by L-P for sampling, and then repeat the process of sampling the first line. After the sampling of the second line is over, continue to move L-P down until it can no longer move down L-P, and after each move, a new line Perform sampling consistent with the first row.
根据本公开实施例的第二方面,提供一种数据处理装置,所述装置包括:控制器,用于根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,其中,所述步长大于1;根据所述卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,其中,所述至少一个第一采样结果和所述至少一个第二采样结果一一对应;以及将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列;所述处理阵列,用于对所述至少一个第一采样结果和所述至少一个第二采样结果进行处理,并输出处理结果。According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus, the apparatus comprising: a controller configured to sample data to be processed according to a step size of a convolution operation to obtain at least one first sampling result, wherein, The step size is greater than 1; the convolution kernel is sampled according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result One-to-one correspondence; and correspondingly input the at least one first sampling result and the at least one second sampling result to a processing array; the processing array is configured to compare the at least one first sampling result and the at least one second sampling result At least one second sampling result is processed, and the processing result is output.
本公开的一些实施例中,所述控制器,用于按照所述步长对所述待处理数据进行行采样,得到至少一个第一行采样结果,其中,所述至少一个第一行采样结果的并集为所述待处理数据;按照所述步长对所述待处理数据进行列采样,得到至少一个第一列采样结果,其中,所述至少一个第一列采样结果的并集为所述待处理数据;分别将每个所述第一行采样结果和每个所述第一列采样结果的交集,确定为第一采样结果。In some embodiments of the present disclosure, the controller is configured to perform line sampling on the data to be processed according to the step size to obtain at least one first line sampling result, wherein the at least one first line sampling result The union of the data to be processed is the data to be processed; the data to be processed is subjected to column sampling according to the step size to obtain at least one sampling result of the first column, wherein the union of the at least one sampling result of the first column is the The data to be processed is determined; the intersection of each sampling result of the first row and the sampling result of each first column is determined as the first sampling result.
本公开的一些实施例中,所述控制器,用于按照所述步长对所述卷积核进行行采样,得到至少一个第二行采样结果,其中,所述至少一个第二行采样结果的并集为所述卷积核;按照所述步长对所述卷积核进行列采样,得到至少一个第二列采样结果,其中,所述至少一个第二列采样结果的并集为所述卷积核;分别将每个所述第二行采样结果和每个所述第二列采样结果的交集,确定为第二采样结果。In some embodiments of the present disclosure, the controller is configured to perform line sampling on the convolution kernel according to the step size to obtain at least one second line sampling result, wherein the at least one second line sampling result The union of the convolution kernel is the convolution kernel; the column sampling is performed on the convolution kernel according to the step size to obtain at least one second-column sampling result, wherein the union of the at least one second-column sampling result is the The convolution kernel; respectively, the intersection of each of the second row sampling results and each of the second column sampling results is determined as the second sampling result.
本公开的一些实施例中,所述控制器用于针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,并将与该第一采样结果对应的第二采样结果输入至所述处理阵列;并所述处理阵列,用于根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果;以及根据每个第一采样结果分别对应的子处理结果,输出处理结果。In some embodiments of the present disclosure, the controller is configured to, for each first sampling result, input the first sampling result into the processing array, and input a second sampling result corresponding to the first sampling result into the processing array the processing array; and the processing array is used to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result; and output the corresponding sub-processing result according to each first sampling result process result.
本公开的一些实施例中,所述控制器用于针对每个第一采样结果,将该第一采样结果的多个数值输入至所述处理阵列的多个单元中,使得所述多个数值在所述多个单元中的相对位置与所述多个数值在该第一采样结果中的相对位置相同。In some embodiments of the present disclosure, the controller is configured to, for each first sampling result, input multiple values of the first sampling result into multiple units of the processing array, so that the multiple values are in the The relative positions of the plurality of units are the same as the relative positions of the plurality of values in the first sampling result.
本公开的一些实施例中,所述处理阵列包括有效阵列、分布在所述有效阵列周围的至少一个溢出行和至少一个溢出列,其中,所述有效阵列包括多个用于存储和处理数据的第一单元,所述溢出行和所述溢出列包括多个用于存储数据的第二单元;所述控制器,用于将该第一采样结果的多个数值,输入至所述处理阵列的多个单元中,使得该第一采样结果中首行首列的数值输入至目标单元中,所述目标单元在所述多个第一单元中位于首行首列。In some embodiments of the present disclosure, the processing array includes an active array, at least one overflow row and at least one overflow column distributed around the active array, wherein the active array includes a plurality of The first unit, the overflow row and the overflow column include a plurality of second units for storing data; the controller is used for inputting the plurality of values of the first sampling result to the processing array In the plurality of units, the values of the first row and the first column in the first sampling result are input into the target unit, and the target unit is located in the first row and the first column in the plurality of first units.
本公开的一些实施例中,所述处理阵列,用于针对对应的第二采样结果中的每个权重值,采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和;根据该对应的第二采样结果中各权重值分别对应的部分和确定部分结果;以及根据至少一个部分结果,确定该第一采样结果对应的子处理结果。In some embodiments of the present disclosure, the processing array is configured to, for each weight value in the corresponding second sampling result, use a numerical value corresponding to the weight value in the first sampling result, and use the value corresponding to the weight value in the weight value determination part and; determining a partial result according to the respective weight values in the corresponding second sampling result and determining a partial result; and determining a sub-processing result corresponding to the first sampling result according to at least one partial result.
本公开的一些实施例中,所述处理阵列用于针对对应的第二采样结果中的首个权重值,采用该第一采样结果在所述处理阵列的初始位置对应单元中的数值与该首个权重值确定部分和。In some embodiments of the present disclosure, the processing array is configured to, for the first weight value in the corresponding second sampling result, use the value of the first sampling result in the unit corresponding to the initial position of the processing array and the first weight value. A weight value determines the partial sum.
本公开的一些实施例中,所述控制器,用于针对对应的第二采样结果中的每个非首个权重值,根据该第一采样结果中与该非首个权重值对应的第一数值,以及该第一采样结果中与该非首个权重值的上一个权重值对应的第二数值在该第一采样结果中的位置关系,确定所述第一采样结果的移动方式;所述处理阵列,用于采用确定的移动方 式将所述第二数值移动至对应单元;并采用移动后的对应单元中的数值与该非首个权重值确定部分和。In some embodiments of the present disclosure, the controller is configured to, for each non-first weight value in the corresponding second sampling result, according to the first sampling result corresponding to the non-first weight value the numerical value, and the positional relationship in the first sampling result of the second numerical value in the first sampling result corresponding to the previous weight value that is not the first weight value, to determine the movement mode of the first sampling result; the The processing array is used for moving the second numerical value to the corresponding unit by using the determined moving mode; and determining the partial sum by using the numerical value in the moved corresponding unit and the non-first weight value.
本公开的一些实施例中,所述控制器,还用于根据所述处理阵列、所述卷积核以及所述步长,确定所述待处理数据的行数与列数;根据所述卷积核以及所述步长,确定重叠行数与重叠列数;根据所述待处理数据的行数与列数、所述重叠行数以及所述重叠列数,对待处理图像的数据进行采样,得到多个待处理数据。In some embodiments of the present disclosure, the controller is further configured to determine the number of rows and columns of the data to be processed according to the processing array, the convolution kernel and the step size; according to the volume The accumulation kernel and the step size determine the number of overlapping rows and columns; according to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, the data of the image to be processed is sampled, Get multiple pending data.
本公开的一些实施例中,所述待处理数据为单通道数据或多通道数据中的一个通道,所述卷积核为单通道卷积核或多通道卷积核中的一个通道。In some embodiments of the present disclosure, the data to be processed is single-channel data or one channel of multi-channel data, and the convolution kernel is one channel of single-channel convolution kernel or multi-channel convolution kernel.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在第一方面有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment of the method related to the first aspect, and will not be described in detail here.
本公开实施例提供的数据处理装置可以包括芯片、AI芯片等。The data processing apparatus provided by the embodiments of the present disclosure may include a chip, an AI chip, and the like.
第三方面,本公开至少一个实施例提供了一种电子设备,请参照附图14,其示出了该设备的结构,所述设备包括存储器、处理器,以及本公开实施例提供的数据处理装置。所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时基于第一方面所述的方法对数据进行处理。In a third aspect, at least one embodiment of the present disclosure provides an electronic device. Please refer to FIG. 14 , which shows the structure of the device. The device includes a memory, a processor, and the data processing provided by the embodiment of the present disclosure. device. The memory is used to store computer instructions executable on a processor for processing data based on the method of the first aspect when executing the computer instructions.
第四方面,本公开至少一个实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面所述的方法。In a fourth aspect, at least one embodiment of the present disclosure provides a computer-readable storage medium having a computer program stored thereon, the program implementing the method of the first aspect when executed by a processor.
在本公开中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性。术语“多个”指两个或两个以上,除非另有明确的限定。In the present disclosure, the terms "first" and "second" are used for descriptive purposes only, and should not be construed as indicating or implying relative importance. The term "plurality" refers to two or more, unless expressly limited otherwise.
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common general knowledge or techniques in the technical field not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (24)

  1. 一种数据处理方法,其特征在于,所述方法包括:A data processing method, characterized in that the method comprises:
    根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,其中,所述步长大于1;Sampling the data to be processed according to the step size of the convolution operation to obtain at least one first sampling result, wherein the step size is greater than 1;
    根据所述卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,其中,所述至少一个第一采样结果和所述至少一个第二采样结果一一对应;Sampling the convolution kernel according to the step size of the convolution operation to obtain at least one second sampling result, wherein the at least one first sampling result corresponds to the at least one second sampling result in one-to-one correspondence;
    将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列,以使所述处理阵列输出处理结果。The at least one first sampling result and the at least one second sampling result are correspondingly input to the processing array, so that the processing array outputs the processing result.
  2. 根据权利要求1所述的数据处理方法,其特征在于,所述根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,包括:The data processing method according to claim 1, wherein the sampling of the data to be processed according to the step size of the convolution operation to obtain at least one first sampling result, comprising:
    按照所述步长对所述待处理数据进行行采样,得到至少一个第一行采样结果,其中,所述至少一个第一行采样结果的并集为所述待处理数据;Perform row sampling on the data to be processed according to the step size to obtain at least one sampling result of the first row, wherein the union of the at least one sampling result of the first row is the data to be processed;
    按照所述步长对所述待处理数据进行列采样,得到至少一个第一列采样结果,其中,所述至少一个第一列采样结果的并集为所述待处理数据;Column sampling is performed on the data to be processed according to the step size to obtain at least one sampling result of the first column, wherein the union of the at least one sampling result of the first column is the data to be processed;
    分别将每个所述第一行采样结果和每个所述第一列采样结果的交集,确定为第一采样结果。The intersection of each of the first row sampling results and each of the first column sampling results is determined as the first sampling result.
  3. 根据权利要求1所述的数据处理方法,其特征在于,所述根据所述卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,包括:The data processing method according to claim 1, wherein the sampling of the convolution kernel according to the step size of the convolution operation to obtain at least one second sampling result, comprising:
    按照所述步长对所述卷积核进行行采样,得到至少一个第二行采样结果,其中,所述至少一个第二行采样结果的并集为所述卷积核;Perform row sampling on the convolution kernel according to the step size to obtain at least one second row sampling result, wherein the union of the at least one second row sampling result is the convolution kernel;
    按照所述步长对所述卷积核进行列采样,得到至少一个第二列采样结果,其中,所述至少一个第二列采样结果的并集为所述卷积核;Perform column sampling on the convolution kernel according to the step size to obtain at least one second column sampling result, wherein the union of the at least one second column sampling result is the convolution kernel;
    分别将每个所述第二行采样结果和每个所述第二列采样结果的交集,确定为第二采样结果。The intersection of each of the second row sampling results and each of the second column sampling results is determined as the second sampling result.
  4. 根据权利要求1所述的数据处理方法,其特征在于,所述将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列,以使所述处理阵列输出处理结果,包括:The data processing method according to claim 1, wherein the at least one first sampling result and the at least one second sampling result are correspondingly input to a processing array, so that the processing array outputs Processing results, including:
    针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,并将与该第一采样结果对应的第二采样结果输入至所述处理阵列;并for each first sampling result, inputting the first sampling result to the processing array, and inputting a second sampling result corresponding to the first sampling result to the processing array; and
    控制所述处理阵列根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果;controlling the processing array to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result;
    控制所述处理阵列根据每个第一采样结果分别对应的子处理结果,输出处理结果。The processing array is controlled to output the processing result according to the sub-processing result corresponding to each first sampling result.
  5. 根据权利要求4所述的数据处理方法,其特征在于,所述针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,包括:The data processing method according to claim 4, wherein, for each first sampling result, inputting the first sampling result to the processing array comprises:
    针对每个第一采样结果,将该第一采样结果的多个数值输入至所述处理阵列的多个单元中,使得所述多个数值在所述多个单元中的相对位置与所述多个数值在该第一采样结果中的相对位置相同。For each first sampling result, input a plurality of numerical values of the first sampling result into a plurality of units of the processing array, so that the relative positions of the plurality of numerical values in the plurality of units are different from those of the plurality of units. The relative positions of the values in the first sampling result are the same.
  6. 根据权利要求5所述的数据处理方法,其特征在于,所述处理阵列包括有效阵列、分布在所述有效阵列周围的至少一个溢出行和至少一个溢出列,其中,所述有效阵列包括多个用于存储和处理数据的第一单元,所述溢出行和所述溢出列包括多个用于存储数据的第二单元;所述将该第一采样结果的多个数值,输入至所述处理阵列的多个单 元中,包括:The data processing method according to claim 5, wherein the processing array comprises a valid array, at least one overflow row and at least one overflow column distributed around the valid array, wherein the valid array comprises a plurality of a first unit for storing and processing data, the overflow row and the overflow column include a plurality of second units for storing data; the plurality of numerical values of the first sampling result are input to the processing Multiple elements of the array, including:
    将该第一采样结果的多个数值,输入至所述处理阵列的多个单元中,使得该第一采样结果中首行首列的数值输入至目标单元中,所述目标单元在所述多个第一单元中位于首行首列。Inputting multiple values of the first sampling result into multiple units of the processing array, so that the numerical values of the first row and first column in the first sampling result are input into the target unit, and the target unit is in the multiple units. The first cell is located in the first row and first column.
  7. 根据权利要求4至6任一项所述的数据处理方法,其特征在于,所述控制所述处理阵列根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果,包括:The data processing method according to any one of claims 4 to 6, wherein the controlling the processing array to determine the corresponding sub-processing result according to the first sampling result and the corresponding second sampling result, comprising:
    针对对应的第二采样结果中的每个权重值,控制所述处理阵列采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和;For each weight value in the corresponding second sampling result, control the processing array to use the value corresponding to the weight value in the first sampling result, and determine the partial sum with the weight value;
    控制所述处理阵列根据该对应的第二采样结果中各权重值分别对应的部分和确定部分结果;Controlling the processing array to determine the partial result and the partial result corresponding to each weight value in the corresponding second sampling result;
    控制所述处理阵列根据至少一个部分结果,确定该第一采样结果对应的子处理结果。The processing array is controlled to determine a sub-processing result corresponding to the first sampling result according to at least one partial result.
  8. 根据权利要求7所述的数据处理方法,其特征在于,所述针对对应的第二采样结果中的每个权重值,控制所述处理阵列采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和,包括:The data processing method according to claim 7, wherein for each weight value in the corresponding second sampling result, the processing array is controlled to use the value corresponding to the weight value in the first sampling result , and the weight value determines the partial sum, including:
    针对对应的第二采样结果中的首个权重值,控制所述处理阵列采用该第一采样结果在所述处理阵列的初始位置对应单元中的数值与该首个权重值确定部分和。For the first weight value in the corresponding second sampling result, the processing array is controlled to use the value of the first sampling result in the unit corresponding to the initial position of the processing array and the first weight value to determine the partial sum.
  9. 根据权利要求7所述的数据处理方法,其特征在于,所述针对对应的第二采样结果中的每个权重值,控制所述处理阵列采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和,包括:The data processing method according to claim 7, wherein for each weight value in the corresponding second sampling result, the processing array is controlled to use the value corresponding to the weight value in the first sampling result , and the weight value determines the partial sum, including:
    针对对应的第二采样结果中的每个非首个权重值,根据该第一采样结果中与该非首个权重值对应的第一数值,以及该第一采样结果中与该非首个权重值的上一个权重值对应的第二数值在该第一采样结果中的位置关系,确定所述第一采样结果的移动方式,并控制所述处理阵列采用确定的移动方式将所述第二数值移动至对应单元;For each non-first weight value in the corresponding second sampling result, according to the first value corresponding to the non-first weight value in the first sampling result and the non-first weight value in the first sampling result The positional relationship of the second value corresponding to the previous weight value of the value in the first sampling result, determining the movement mode of the first sampling result, and controlling the processing array to use the determined movement mode to convert the second value Move to the corresponding unit;
    控制所述处理阵列采用移动后的对应单元中的数值与该非首个权重值确定部分和。The processing array is controlled to determine a partial sum using the numerical value in the shifted corresponding unit and the non-first weight value.
  10. 根据权利要求1至9任一项所述的数据处理方法,其特征在于,所述待处理数据的获取方式包括:The data processing method according to any one of claims 1 to 9, wherein the acquisition method of the data to be processed comprises:
    根据所述处理阵列、所述卷积核以及所述步长,确定所述待处理数据的行数与列数;According to the processing array, the convolution kernel and the step size, determine the number of rows and columns of the data to be processed;
    根据所述卷积核以及所述步长,确定重叠行数与重叠列数;According to the convolution kernel and the step size, determine the number of overlapping rows and the number of overlapping columns;
    根据所述待处理数据的行数与列数、所述重叠行数以及所述重叠列数,对待处理图像的数据进行采样,得到多个待处理数据。According to the number of rows and columns of the data to be processed, the number of overlapping rows and the number of overlapping columns, the data of the image to be processed is sampled to obtain a plurality of data to be processed.
  11. 根据权利要求1至10任一项所述的数据处理方法,其特征在于,所述待处理数据为单通道数据或多通道数据中的一个通道,所述卷积核为单通道卷积核或多通道卷积核中的一个通道。The data processing method according to any one of claims 1 to 10, wherein the data to be processed is one channel of single-channel data or multi-channel data, and the convolution kernel is a single-channel convolution kernel or A channel in a multi-channel convolution kernel.
  12. 一种数据处理装置,其特征在于,所述装置包括:A data processing device, characterized in that the device comprises:
    控制器,用于根据卷积运算的步长对待处理数据进行采样,得到至少一个第一采样结果,其中,所述步长大于1;根据所述卷积运算的步长对卷积核进行采样,得到至少一个第二采样结果,其中,所述至少一个第一采样结果和所述至少一个第二采样结果一一对应;以及将所述至少一个第一采样结果和所述至少一个第二采样结果,对应的输入至处理阵列;a controller, configured to sample the data to be processed according to the step size of the convolution operation to obtain at least one first sampling result, wherein the step size is greater than 1; and sample the convolution kernel according to the step size of the convolution operation , obtain at least one second sampling result, wherein the at least one first sampling result and the at least one second sampling result are in one-to-one correspondence; and the at least one first sampling result and the at least one second sampling result are in a one-to-one correspondence As a result, the corresponding input to the processing array;
    所述处理阵列,用于对所述至少一个第一采样结果和所述至少一个第二采样结果进行处理,并输出处理结果。The processing array is configured to process the at least one first sampling result and the at least one second sampling result, and output the processing result.
  13. 根据权利要求12所述的数据处理装置,其特征在于,所述控制器,用于按照所述步长对所述待处理数据进行行采样,得到至少一个第一行采样结果,其中,所述至少一个第一行采样结果的并集为所述待处理数据;按照所述步长对所述待处理数据进行列采样,得到至少一个第一列采样结果,其中,所述至少一个第一列采样结果的并集为所述待处理数据;分别将每个所述第一行采样结果和每个所述第一列采样结果的交集,确定为第一采样结果。The data processing apparatus according to claim 12, wherein the controller is configured to perform line sampling on the data to be processed according to the step size to obtain at least one first line sampling result, wherein the The union of at least one sampling result of the first row is the data to be processed; column sampling is performed on the data to be processed according to the step size to obtain at least one sampling result of the first column, wherein the at least one first column The union of the sampling results is the data to be processed; the intersection of each of the first row of sampling results and each of the first column of sampling results is determined as the first sampling result.
  14. 根据权利要求12所述的数据处理装置,其特征在于,所述控制器,用于按照所述步长对所述卷积核进行行采样,得到至少一个第二行采样结果,其中,所述至少一个第二行采样结果的并集为所述卷积核;按照所述步长对所述卷积核进行列采样,得到至少一个第二列采样结果,其中,所述至少一个第二列采样结果的并集为所述卷积核;分别将每个所述第二行采样结果和每个所述第二列采样结果的交集,确定为第二采样结果。The data processing apparatus according to claim 12, wherein the controller is configured to perform line sampling on the convolution kernel according to the step size to obtain at least one second line sampling result, wherein the A union of at least one second row sampling result is the convolution kernel; column sampling is performed on the convolution kernel according to the step size to obtain at least one second column sampling result, wherein the at least one second column The union of the sampling results is the convolution kernel; the intersection of each sampling result of the second row and the sampling result of each second column is determined as the second sampling result.
  15. 根据权利要求12所述的数据处理装置,其特征在于,所述控制器用于针对每个第一采样结果,将该第一采样结果输入至所述处理阵列,并将与该第一采样结果对应的第二采样结果输入至所述处理阵列;并The data processing apparatus according to claim 12, wherein the controller is configured to, for each first sampling result, input the first sampling result to the processing array, and set the first sampling result corresponding to the first sampling result The second sampled result of is input to the processing array; and
    所述处理阵列,用于根据该第一采样结果和对应的第二采样结果,确定对应的子处理结果;以及根据每个第一采样结果分别对应的子处理结果,输出处理结果。The processing array is used for determining corresponding sub-processing results according to the first sampling result and the corresponding second sampling result; and outputting the processing results according to the sub-processing results corresponding to each first sampling result respectively.
  16. 根据权利要求15所述的数据处理装置,其特征在于,所述控制器用于针对每个第一采样结果,将该第一采样结果的多个数值输入至所述处理阵列的多个单元中,使得所述多个数值在所述多个单元中的相对位置与所述多个数值在该第一采样结果中的相对位置相同。The data processing apparatus according to claim 15, wherein the controller is configured to, for each first sampling result, input multiple values of the first sampling result into multiple units of the processing array, The relative positions of the plurality of numerical values in the plurality of units are the same as the relative positions of the plurality of numerical values in the first sampling result.
  17. 根据权利要求16所述的数据处理装置,其特征在于,所述处理阵列包括有效阵列、分布在所述有效阵列周围的至少一个溢出行和至少一个溢出列,其中,所述有效阵列包括多个用于存储和处理数据的第一单元,所述溢出行和所述溢出列包括多个用于存储数据的第二单元;17. The data processing apparatus of claim 16, wherein the processing array comprises a valid array, at least one overflow row and at least one overflow column distributed around the valid array, wherein the valid array comprises a plurality of a first unit for storing and processing data, the overflow row and the overflow column include a plurality of second units for storing data;
    所述控制器,用于将该第一采样结果的多个数值,输入至所述处理阵列的多个单元中,使得该第一采样结果中首行首列的数值输入至目标单元中,所述目标单元在所述多个第一单元中位于首行首列。The controller is configured to input a plurality of numerical values of the first sampling result into a plurality of units of the processing array, so that the numerical values of the first row and first column in the first sampling result are input into the target unit, so The target unit is located in the first row and first column among the plurality of first units.
  18. 根据权利要求15至17任一项所述的数据处理装置,其特征在于,所述处理阵列,用于针对对应的第二采样结果中的每个权重值,采用该第一采样结果中与该权重值对应的数值,与该权重值确定部分和;The data processing apparatus according to any one of claims 15 to 17, wherein the processing array is configured to, for each weight value in the corresponding second sampling result, use the first sampling result and the The value corresponding to the weight value, and the weight value determines the partial sum;
    根据该对应的第二采样结果中各权重值分别对应的部分和确定部分结果;以及According to the corresponding second sampling result, the corresponding part of each weight value and the part of the result are determined; and
    根据至少一个部分结果,确定该第一采样结果对应的子处理结果。A sub-processing result corresponding to the first sampling result is determined according to at least one partial result.
  19. 根据权利要求18所述的数据处理装置,其特征在于,所述处理阵列用于针对对应的第二采样结果中的首个权重值,采用该第一采样结果在所述处理阵列的初始位置对应单元中的数值与该首个权重值确定部分和。The data processing apparatus according to claim 18, wherein the processing array is configured to, for the first weight value in the corresponding second sampling result, use the first sampling result to correspond to the initial position of the processing array The value in the cell and this first weight value determine the partial sum.
  20. 根据权利要求18所述的数据处理装置,其特征在于,所述控制器,用于针对对应的第二采样结果中的每个非首个权重值,根据该第一采样结果中与该非首个权重值对应的第一数值,以及该第一采样结果中与该非首个权重值的上一个权重值对应的第二 数值在该第一采样结果中的位置关系,确定所述第一采样结果的移动方式;The data processing apparatus according to claim 18, wherein the controller is configured to, for each non-first weight value in the corresponding second sampling result, determine the difference between the non-first weight value in the first sampling result and the non-first weight value in the first sampling result The first value corresponding to each weight value, and the positional relationship in the first sampling result of the second value corresponding to the last weight value of the non-first weight value in the first sampling result, determine the first sampling result the way the result is moved;
    所述处理阵列,用于采用确定的移动方式将所述第二数值移动至对应单元;并采用移动后的对应单元中的数值与该非首个权重值确定部分和。The processing array is used for moving the second numerical value to the corresponding unit in a determined moving manner; and determining the partial sum by using the numerical value in the moved corresponding unit and the non-first weight value.
  21. 根据权利要求12至20任一项所述的数据处理装置,其特征在于,所述控制器,还用于根据所述处理阵列、所述卷积核以及所述步长,确定所述待处理数据的行数与列数;根据所述卷积核以及所述步长,确定重叠行数与重叠列数;根据所述待处理数据的行数与列数、所述重叠行数以及所述重叠列数,对待处理图像的数据进行采样,得到多个待处理数据。The data processing apparatus according to any one of claims 12 to 20, wherein the controller is further configured to determine the to-be-processed according to the processing array, the convolution kernel and the step size The number of rows and columns of the data; according to the convolution kernel and the step size, determine the number of overlapping rows and columns; according to the number of rows and columns of the data to be processed, the number of overlapping rows and the The number of overlapping columns, the data of the image to be processed is sampled, and multiple data to be processed are obtained.
  22. 根据权利要求12至21任一项所述的数据处理装置,其特征在于,所述待处理数据为单通道数据或多通道数据中的一个通道,所述卷积核为单通道卷积核或多通道卷积核中的一个通道。The data processing device according to any one of claims 12 to 21, wherein the data to be processed is one channel of single-channel data or multi-channel data, and the convolution kernel is a single-channel convolution kernel or A channel in a multi-channel convolution kernel.
  23. 一种电子设备,其特征在于,所述设备包括存储器、处理器,以及如权利要求12至22任一项所述的装置。An electronic device, characterized in that, the device comprises a memory, a processor, and the device according to any one of claims 12 to 22.
  24. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至11任一所述的方法。A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1 to 11 is implemented.
PCT/CN2021/115555 2021-03-31 2021-08-31 Data processing method and apparatus, device, and storage medium WO2022205763A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110352221.5A CN112927124A (en) 2021-03-31 2021-03-31 Data processing method, device, equipment and storage medium
CN202110352221.5 2021-03-31

Publications (1)

Publication Number Publication Date
WO2022205763A1 true WO2022205763A1 (en) 2022-10-06

Family

ID=76173597

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115555 WO2022205763A1 (en) 2021-03-31 2021-08-31 Data processing method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN112927124A (en)
WO (1) WO2022205763A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927124A (en) * 2021-03-31 2021-06-08 成都商汤科技有限公司 Data processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300880A1 (en) * 2017-04-12 2018-10-18 Here Global B.V. Small object detection from a large image
CN109885407A (en) * 2019-03-05 2019-06-14 上海商汤智能科技有限公司 Data processing method and device, electronic equipment, storage medium
CN110533164A (en) * 2019-08-05 2019-12-03 西安交通大学 A kind of Winograd convolution method for splitting towards convolutional neural networks accelerator
CN111597029A (en) * 2020-05-20 2020-08-28 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN112395092A (en) * 2020-11-30 2021-02-23 清华大学 Data processing method and artificial intelligence processor
CN112927124A (en) * 2021-03-31 2021-06-08 成都商汤科技有限公司 Data processing method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428189B (en) * 2020-04-01 2023-09-22 南京大学 Data preprocessing method and device for deconvolution operation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300880A1 (en) * 2017-04-12 2018-10-18 Here Global B.V. Small object detection from a large image
CN109885407A (en) * 2019-03-05 2019-06-14 上海商汤智能科技有限公司 Data processing method and device, electronic equipment, storage medium
CN110533164A (en) * 2019-08-05 2019-12-03 西安交通大学 A kind of Winograd convolution method for splitting towards convolutional neural networks accelerator
CN111597029A (en) * 2020-05-20 2020-08-28 上海商汤智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN112395092A (en) * 2020-11-30 2021-02-23 清华大学 Data processing method and artificial intelligence processor
CN112927124A (en) * 2021-03-31 2021-06-08 成都商汤科技有限公司 Data processing method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DI HUANG; XISHAN ZHANG; RUI ZHANG; TIAN ZHI; DEYUAN HE; JIAMING GUO; CHANG LIU; QI GUO; ZIDONG DU; SHAOLI LIU; TIANSHI CHEN; YUNJI: "DWM: A Decomposable Winograd Method for Convolution Acceleration", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 3 February 2020 (2020-02-03), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081590037 *
YANG CHEN, WANG YIZHOU, WANG XIAOLI, GENG LI: "A Stride-Based Convolution Decomposition Method to Stretch CNN Acceleration Algorithms for Efficient and Flexible Hardware Implementation", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, IEEE, US, vol. 67, no. 9, 1 September 2020 (2020-09-01), US , pages 3007 - 3020, XP055973499, ISSN: 1549-8328, DOI: 10.1109/TCSI.2020.2985727 *

Also Published As

Publication number Publication date
CN112927124A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
US20230325348A1 (en) Performing concurrent operations in a processing element
TWI639119B (en) Adaptive execution engine for convolution computing systems cross-reference to related applications
US11960566B1 (en) Reducing computations for data including padding
EP3373210B1 (en) Transposing neural network matrices in hardware
US10445638B1 (en) Restructuring a multi-dimensional array
EP3726399A1 (en) Matrix multiplier
US20180174036A1 (en) Hardware Accelerator for Compressed LSTM
EP3093757B1 (en) Multi-dimensional sliding window operation for a vector processor
CN108170640B (en) Neural network operation device and operation method using same
US11915118B2 (en) Method and apparatus for processing computation of zero value in processing of layers in neural network
US20230259758A1 (en) Adaptive tensor compute kernel for sparse neural network
WO2022205763A1 (en) Data processing method and apparatus, device, and storage medium
CN112967172A (en) Data processing device, method, computer equipment and storage medium
WO2023065983A1 (en) Computing apparatus, neural network processing device, chip, and data processing method
US6675286B1 (en) Multimedia instruction set for wide data paths
JP7174831B2 (en) Video memory processing method, apparatus and recording medium based on convolutional neural network
JP2024028901A (en) Sparse matrix multiplication in hardware
WO2022160704A1 (en) Image processing method and apparatus, computer device and storage medium
CN113052299A (en) Neural network memory computing device based on lower communication bound and acceleration method
Yang et al. BSRA: Block-based super resolution accelerator with hardware efficient pixel attention
US20230376733A1 (en) Convolutional neural network accelerator hardware
WO2023115814A1 (en) Fpga hardware architecture, data processing method therefor and storage medium
CN114758209B (en) Convolution result obtaining method and device, computer equipment and storage medium
CN112927125B (en) Data processing method, device, computer equipment and storage medium
CN114220014A (en) Method, device, equipment and medium for determining saliency target detection model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21934402

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.02.2024)