CN113487017A - Data convolution processing method and device and computer equipment - Google Patents

Data convolution processing method and device and computer equipment Download PDF

Info

Publication number
CN113487017A
CN113487017A CN202110848542.4A CN202110848542A CN113487017A CN 113487017 A CN113487017 A CN 113487017A CN 202110848542 A CN202110848542 A CN 202110848542A CN 113487017 A CN113487017 A CN 113487017A
Authority
CN
China
Prior art keywords
data
zero
weight
convolved
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110848542.4A
Other languages
Chinese (zh)
Inventor
胡云鹏
王洪
胡华斌
曾纪国
阳昭衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Goke Microelectronics Co Ltd
Original Assignee
Hunan Goke Microelectronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Goke Microelectronics Co Ltd filed Critical Hunan Goke Microelectronics Co Ltd
Priority to CN202110848542.4A priority Critical patent/CN113487017A/en
Publication of CN113487017A publication Critical patent/CN113487017A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the application provides a data convolution processing method, a data convolution processing device and computer equipment, wherein the method comprises the following steps: acquiring a plurality of feature map data and a plurality of weight data; dividing a plurality of feature map data and a plurality of weight data into a plurality of grouped data; inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator; controlling a multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation; and controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data. Therefore, multiplication operation containing zero can be eliminated in the convolution processing process, the calculation quantity is reduced, the operation speed is accelerated, and the performance is improved.

Description

Data convolution processing method and device and computer equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a data convolution processing method and apparatus, and a computer device.
Background
With the rapid development of artificial intelligence technology, various neural network models are introduced, and the computational power required by the neural network models is also increasingly improved. In order to meet the computational requirements of the neural network model, various hardware accelerators are produced, and the hardware accelerators refer to special hardware for specific processing, and utilize the parallelism of circuits and high-speed processing to accelerate the operation of various neural network models. The existing neural network model has a large amount of convolution operation, and how to improve the processing speed of the convolution operation becomes a problem which needs to be solved urgently.
Disclosure of Invention
In order to solve the above technical problem, embodiments of the present application provide a data convolution processing method, apparatus and computer device.
In a first aspect, an embodiment of the present application provides a data convolution processing method, where the method includes:
acquiring a plurality of feature map data and a plurality of weight data;
dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data, wherein the grouped data comprises one feature map data and one corresponding weight data;
inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator;
controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation;
and controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data.
Optionally, the controlling the multiply-add unit to skip a zero value point of the data to be convolved and a zero value point of the weight data of the grouped data, and performing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data includes:
acquiring first zero-value point statistical information and second zero-value point statistical information, wherein each data bit of the first zero-value point statistical information is provided with a mark for judging whether each numerical point of the data to be convolved is zero, and each data bit of the second zero-value point statistical information is provided with a mark for judging whether each numerical point of the weight data of the grouped data is zero;
controlling the multiplication and addition unit to acquire a non-zero value point of the data to be convolved and a non-zero value point of the weight data according to the marks corresponding to the data bits of the first zero value point statistical information and the second zero value point statistical information;
and controlling the multiplication and addition unit to carry out convolution operation on the obtained non-zero value points of the data to be convolved and the non-zero value points of the weight data.
Optionally, the controlling the multiply-add unit to obtain the non-zero point of the data to be convolved and the non-zero point of the weight data according to the first zero-value point statistical information and the flag corresponding to the data bit of the second zero-value point statistical information includes:
controlling the multiplication and addition unit to traverse the marks of each data bit according to the data bit sequence of the first zero-value statistical information and the second zero-value statistical information, and skipping the numerical point of the data to be convolved and the numerical point of the weight data corresponding to the current data bit when traversing to at least one first mark on the current data bit with the same data bit sequence of the first zero-value statistical information and the second zero-value statistical information;
and when the current data bits with the same data bit sequence of the first zero-value point statistical information and the second zero-value point statistical information traverse to the second mark, acquiring a non-zero value point of the data to be convolved and a non-zero value point of the weight data corresponding to the current data bits.
Optionally, the obtaining of the plurality of feature map data and the plurality of weight data includes:
determining the number of multiply-add units of the hardware accelerator;
and determining the plurality of feature map data and the plurality of weight data according to the number of the multiplication and addition units, wherein each feature map data is feature map matrix data, each weight data is weight matrix data, and the number of the grouped data is the same as the number of the multiplication and addition units of the hardware accelerator.
Optionally, the controlling the multiply-add unit to pre-process the feature map data of the grouped data to obtain the data to be convolved for performing a convolution operation includes:
and controlling the multiplication and addition unit to traverse the numerical points of the characteristic map matrix data, and constructing matrix data to be convolved, which has the same number as the row and column of the weight matrix data, for the traversed current numerical points.
Optionally, the constructing matrix data to be convolved, which is the same as the number of rows and columns of the weight matrix data, for the traversed current value point includes:
if the current numerical value point is used as a matrix center, when a complete matrix with the same number of rows and columns as the weight matrix data can be intercepted from the characteristic map matrix data, the intercepted complete matrix is used as the matrix data to be convolved;
and if the current numerical value point is used as a matrix center, and a complete matrix with the same number of rows and columns as the weight matrix data cannot be intercepted from the characteristic map matrix data, supplementing the intercepted incomplete matrix completely to obtain the matrix data to be convolved.
Optionally, the obtaining of the plurality of feature map data and the plurality of weight data includes:
acquiring a plurality of feature map data according to an input image;
the plurality of weight data is obtained from an output channel of the convolutional layer.
In a second aspect, an embodiment of the present application provides a data convolution processing apparatus, including:
the acquisition module is used for acquiring a plurality of feature map data and a plurality of weight data;
the dividing module is used for dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data, wherein the grouped data comprises one feature map data and one corresponding weight data;
the input module is used for inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator;
the preprocessing module is used for controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation;
and the control module is used for controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data.
Optionally, the control module is further configured to obtain first zero-value point statistical information and second zero-value point statistical information, where each data bit of the first zero-value point statistical information is provided with a flag indicating whether each value point of the data to be convolved is zero, and each data bit of the second zero-value point statistical information is provided with a flag indicating whether each value point of the weight data of the grouped data is zero;
controlling the multiplication and addition unit to acquire a non-zero value point of the data to be convolved and a non-zero value point of the weight data according to the marks corresponding to the data bits of the first zero value point statistical information and the second zero value point statistical information;
and controlling the multiplication and addition unit to carry out convolution operation on the obtained non-zero value points of the data to be convolved and the non-zero value points of the weight data.
Optionally, the flags include a first flag indicating a zero-value point, and a second flag indicating a non-zero-value point, and the control module is further configured to control the multiplication and addition unit to traverse the flags of each data bit according to the data bit ordering of the first zero-value point statistical information and the second zero-value point statistical information, and skip the value point of the data to be convolved corresponding to the current data bit and the value point of the weight data when traversing to at least one first flag on the current data bit having the same data bit ordering of the first zero-value point statistical information and the second zero-value point statistical information;
and when the current data bits with the same data bit sequence of the first zero-value point statistical information and the second zero-value point statistical information traverse to the second mark, acquiring a non-zero value point of the data to be convolved and a non-zero value point of the weight data corresponding to the current data bits.
Optionally, the obtaining module is configured to determine the number of multiply-add units of the hardware accelerator;
and determining the plurality of feature map data and the plurality of weight data according to the number of the multiplication and addition units, wherein each feature map data is feature map matrix data, each weight data is weight matrix data, and the number of the grouped data is the same as the number of the multiplication and addition units of the hardware accelerator.
Optionally, the data to be convolved is matrix data to be convolved, and the preprocessing module is configured to control the multiply-add unit to traverse the numerical points of the feature map matrix data, and construct matrix data to be convolved, which is the same as the number of rows and columns of the weight matrix data, for the traversed current numerical points.
Optionally, the preprocessing module is further configured to, if the current value point is used as a matrix center, intercept, from the feature map matrix data, a complete matrix with the same number of rows and columns as that of the weight matrix data, and then use the intercepted complete matrix as the matrix data to be convolved;
and if the current numerical value point is used as a matrix center, and a complete matrix with the same number of rows and columns as the weight matrix data cannot be intercepted from the characteristic map matrix data, supplementing the intercepted incomplete matrix completely to obtain the matrix data to be convolved.
Optionally, the obtaining module is further configured to obtain the plurality of feature map data according to the input image;
the plurality of weight data is obtained from an output channel of the convolutional layer.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory and a processor, where the memory is used to store a computer program, and the computer program executes the data convolution processing method provided in the first aspect when the processor runs.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program runs on a processor, the computer program performs the data convolution processing method provided in the first aspect.
The data convolution processing method, the data convolution processing device and the computer equipment obtain a plurality of feature map data and a plurality of weight data; dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data; inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator; controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation; and controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data. Therefore, invalid calculation is removed in the convolution operation process, when at least one zero value appears in the weight data or the characteristic diagram data during the multiplication operation, the multiplication operation is skipped, the next multiplication operation is carried out, the multiplication operation containing the zero value can be removed in the convolution processing process, the calculation quantity is reduced, the operation speed is accelerated, and the performance is improved.
Drawings
In order to more clearly explain the technical solutions of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like components are numbered similarly in the various figures.
Fig. 1 is a schematic flow chart illustrating a data convolution processing method according to an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating step S105 of the data convolution processing method according to the embodiment of the present application;
fig. 3 is a schematic structural diagram illustrating step S1052 of the data convolution processing method according to the embodiment of the present application;
fig. 4 is a schematic flow chart illustrating step S101 of the data convolution processing method according to the embodiment of the present application;
fig. 5 shows a schematic structural diagram of a data convolution processing apparatus provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.
Example 1
The embodiment of the disclosure provides a data convolution processing method.
Specifically, referring to fig. 1, the data convolution processing method includes:
step S101, a plurality of feature map data and a plurality of weight data are acquired.
In the present embodiment, the feature map (feature map) data may be feature map matrix data, and the weight data may be weight matrix data. The weight matrix data may be obtained from the output channels of the convolutional layer. The number of rows and columns of the characteristic map matrix data is the same as the number of rows and columns of the weight matrix data.
Optionally, step S101 includes:
acquiring a plurality of feature map data according to an input image;
the plurality of weight data is obtained from an output channel of the convolutional layer.
For example, the feature map matrix data is 7 × 7 feature map matrix data, and the corresponding weight matrix data is 7 × 7 weight matrix data; the feature map matrix data is 5 × 5 feature map matrix data, and the corresponding weight matrix data is 5 × 5 weight matrix data. The characteristic map matrix data is 3 × 3 characteristic map matrix data, and the corresponding weight matrix data is 3 × 3 weight matrix data. The characteristic map matrix data is 1 × 1 characteristic map matrix data, and the corresponding weight matrix data is 1 × 1 weight matrix data. Note that 7 × 7, 5 × 5, 3 × 3, and 1 × 1 correspond to the size of the convolution kernel.
Step S102, dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data.
In this embodiment, the packet data includes a feature map data and a corresponding weight data. Thus, multiplexing of the feature map data and the weight data can be realized, and the processing speed can be improved.
For example, if there are 64 signature data and 8 weight data, each signature data and each weight data can be divided into one packet data, and 64 × 8 packet data can be divided in total.
Step S103, inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator.
In this embodiment, the number of Multiply and Accumulate (MAC) units of the hardware accelerator may be defined according to requirements, for example, the hardware accelerator may include 512 Multiply and Accumulate units. If the packet data comprises 512 packet data, each packet data can correspond to one of 512 multiply-add units, and the 512 multiply-add units can be arranged in an array of 8 rows and 64 columns.
And step S104, controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation.
For example, the feature map data may be 7 × 7 feature map matrix data, the corresponding weight matrix data is 7 × 7 weight matrix data, when performing convolution operation based on the 7 × 7 weight matrix data and the 7 × 7 weight matrix data, it is necessary to construct a 7 × 7 matrix for each element of the 7 × 7 feature map matrix data in advance, and the 7 × 7 matrix constructed for each element may be understood as data to be convolved for performing convolution operation.
Step S105, controlling the multiply-add unit to skip the zero point of the data to be convolved and the zero point of the weight data of the grouped data, and performing convolution operation on the non-zero point of the data to be convolved and the non-zero point of the weight data of the grouped data.
In the embodiment, invalid calculation is removed in the convolution operation process, when at least one zero value appears in the weight data or the feature map data during the multiplication operation, the multiplication operation is skipped for the next multiplication operation, the multiplication operation containing the zero value can be removed in the convolution processing process, the calculation number is reduced, the operation speed is accelerated, and the performance is improved.
Optionally, referring to fig. 2, step S105 includes:
step S1051, acquiring first zero-valued statistical information and second zero-valued statistical information.
In this embodiment, each data bit of the first zero-value point statistic information is provided with a flag indicating whether each value point of the data to be convolved is zero, and each data bit of the second zero-value point statistic information is provided with a flag indicating whether each value point of the weight data of the grouped data is zero.
For example, if the data to be convolved is a 7 × 7 matrix and the weight data is weight matrix data of 7 × 7, a first data block of 49 bit data bits is used to store first zero-value point statistical information, and each data bit corresponds to a flag that records whether each value point of the data to be convolved of the 7 × 7 matrix is zero or not. A second data block of 49 bits of data is used to store second zero-valued point statistics, each bit of data corresponding to a flag that records whether each value point of the 7 x 7 weight matrix data is zero.
Step S1052, controlling the multiply-add unit to obtain the non-zero point of the data to be convolved and the non-zero point of the weight data according to the flag corresponding to the data bit of the first zero-value point statistical information and the second zero-value point statistical information.
It should be noted that, in the convolution operation, a multiplication element is located between two corresponding value points of the data to be convolved and the weight data, and if one of the two corresponding value points of the data to be convolved and the weight data is a zero value, the product of the multiplication is also a zero value, so that the zero value point can be skipped, and the value of the non-zero value point is obtained for the two corresponding value points of the data to be convolved and the weight data.
And step S1053, controlling the multiplication and addition unit to carry out convolution operation on the obtained nonzero value point of the data to be convolved and the nonzero value point of the weight data.
For example, if the data to be convolved is a 7 × 7 matrix and the weight data is weight matrix data of 7 × 7, if the zero-removing process is not performed, 49 clocks are needed to perform 49 times of multiplication operations, and if the zero-removing operation is performed, only the multiplication operation is performed on the values of the data to be convolved and the weight data, where the two corresponding value points are non-zero value points, so that the zero-valued multiplication operation of the data to be convolved or the weight data is omitted, the operation time can be saved, and the operation speed can be increased. It should be noted that, if 8 × 64 pieces of packet data are respectively input to the corresponding multiply-add units for data processing, if one multiply-add unit completes the calculation processing on one piece of input packet data, it will continue to wait for all 8 × 64 multiply-add units to complete the processing on 8 × 64 pieces of packet data, and then enter into performing convolution operation on the next set of data to be convolved, where the next set of data to be convolved includes multiple feature map data and multiple weight data, that is, steps S101 to S105 are performed again, which is not described herein again to avoid repetition.
Optionally, the flags include a first flag indicating a zero value point and a second flag indicating a non-zero value point, please refer to fig. 3, and step S1052 includes:
step S10521 of controlling the multiplication and addition unit to traverse the flags of each data bit according to the data bit ordering of the first zero-value statistical information and the second zero-value statistical information, and when traversing at least one first flag on a current data bit having the same data bit ordering of the first zero-value statistical information and the second zero-value statistical information, skipping the value point of the data to be convolved and the value point of the weight data corresponding to the current data bit;
step S10522, when the current data bits with the same data bit order of the first zero-value point statistical information and the second zero-value point statistical information both traverse to the second flag, acquiring a non-zero value point of the data to be convolved and a non-zero value point of the weight data corresponding to the current data bit.
In this implementation, the first flag may be set to 0 and the second flag may be set to 1.
For example, if the data to be convolved is a 7 × 7 matrix and the weight data is weight matrix data of 7 × 7, a first data block of 49 bit data bits is used to store first zero-value-point statistical information, where each data bit corresponds to whether each value point of the data to be convolved recorded in the 7 × 7 matrix is zero, if the value point is 0, then record 0 in the corresponding data bit of the statistical information of a second zero-value point, and if the value point is not 0, then record 1 in the corresponding data bit of the statistical information of the second zero-value point. And storing second zero-value point statistical information by using a second data block of 49 data bits, wherein each data bit corresponds to whether each numerical point of the 7 x 7 weight matrix data is zero or not, if the numerical point is 0, recording 0 in the corresponding data bit of the statistical information at the second zero-value point, and if the numerical point is not 0, recording 1 in the corresponding data bit of the statistical information at the second zero-value point.
If the 49-bit data bits of the first zero-value statistic information and the second zero-value statistic information are all 1, 49 clocks are needed. If the 49-bit data bits of the first zero-value statistical information and the second zero-value statistical information have a value of 0, the multiplication operation of the data bit where the value of 0 is located is skipped, and the number of clocks is reduced.
Optionally, referring to fig. 4, step S101 includes:
step S1011, determining the number of multiplication and addition units of the hardware accelerator;
step S1012, determining the plurality of feature map data and the plurality of weight data according to the number of multiply-add units, where each feature map data is feature map matrix data, each weight data is weight matrix data, and the number of grouped data is the same as the number of multiply-add units of the hardware accelerator.
In this embodiment, the hardware accelerator may include an Artificial Intelligence (AI) accelerator, a Graphics Processing Unit (GPU), and the like. If the hardware accelerator comprises 8x64 multiply-add units, 64 feature map data and 8 weight data can be determined according to 8x64 multiply-add units, 8x64 grouped data are correspondingly divided, the grouped data comprise one feature map data and one corresponding weight data, and the grouped data are respectively input into the corresponding multiply-add units for data processing.
Therefore, multiple times of grouped data can be simultaneously convoluted by the multiple multiplying and adding units, the processing speed is improved, and the processing time is saved.
Optionally, the data to be convolved is matrix data to be convolved, and step S104 includes:
and controlling the multiplication and addition unit to traverse the numerical points of the characteristic map matrix data, and constructing matrix data to be convolved, which has the same number as the row and column of the weight matrix data, for the traversed current numerical points.
For example, the feature map data may be 5 × 5 feature map matrix data, the corresponding weight matrix data is 5 × 5 weight matrix data, when performing convolution operation based on the 5 × 5 weight matrix data and the 5 × 5 weight matrix data, it is necessary to construct a 5 × 5 matrix for each element of the 5 × 5 feature map matrix data in advance, and the 5 × 5 matrix constructed for each element may be understood as data to be convolved for performing convolution operation.
Optionally, the constructing matrix data to be convolved, which is the same as the number of rows and columns of the weight matrix data, for the traversed current value point includes:
if the current numerical value point is used as a matrix center, when a complete matrix with the same number of rows and columns as the weight matrix data can be intercepted from the characteristic map matrix data, the intercepted complete matrix is used as the matrix data to be convolved;
and if the current numerical value point is used as a matrix center, and a complete matrix with the same number of rows and columns as the weight matrix data cannot be intercepted from the characteristic map matrix data, supplementing the intercepted incomplete matrix completely to obtain the matrix data to be convolved.
Therefore, a corresponding matrix to be convolved can be constructed for each element of the characteristic diagram matrix data, and the subsequent convolution operation can be conveniently carried out through the multiplication and addition unit.
In the data convolution processing method provided by this embodiment, a plurality of feature map data and a plurality of weight data are obtained; dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data; inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator; controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation; and controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data. Therefore, invalid calculation is removed in the convolution operation process, when at least one zero value appears in the weight data or the characteristic diagram data during the multiplication operation, the multiplication operation is skipped, the next multiplication operation is carried out, the multiplication operation containing the zero value can be removed in the convolution processing process, the calculation quantity is reduced, the operation speed is accelerated, and the performance is improved.
Example 2
In addition, the embodiment of the disclosure provides a data convolution processing device, which is applied to computer equipment.
Specifically, as shown in fig. 5, the data convolution processing apparatus 500 includes:
an obtaining module 501, configured to obtain multiple feature map data and multiple weight data;
a dividing module 502, configured to divide the feature map data and the weight data into multiple grouped data, where the grouped data includes one feature map data and one corresponding weight data;
an input module 503, configured to input each packet data into a multiply-add unit corresponding to the hardware accelerator;
a preprocessing module 504, configured to control the multiply-add unit to preprocess the feature map data of the packet data, so as to obtain data to be convolved for performing a convolution operation;
and a control module 505, configured to control the multiply-add unit to skip a zero point of the data to be convolved and a zero point of the weight data of the grouped data, and perform convolution operation on a non-zero point of the data to be convolved and a non-zero point of the weight data of the grouped data.
Optionally, the control module 505 is further configured to obtain first zero-value point statistical information and second zero-value point statistical information, where each data bit of the first zero-value point statistical information is provided with a flag indicating whether each value point of the data to be convolved is zero, and each data bit of the second zero-value point statistical information is provided with a flag indicating whether each value point of the weight data of the grouped data is zero;
controlling the multiplication and addition unit to acquire a non-zero value point of the data to be convolved and a non-zero value point of the weight data according to the marks corresponding to the data bits of the first zero value point statistical information and the second zero value point statistical information;
and controlling the multiplication and addition unit to carry out convolution operation on the obtained non-zero value points of the data to be convolved and the non-zero value points of the weight data.
Optionally, the flags include a first flag indicating a zero-valued point, and a second flag indicating a non-zero-valued point, and the control module 505 is further configured to control the multiply-add unit to traverse the flags of each data bit according to the data bit ordering of the first zero-valued point statistical information and the second zero-valued point statistical information, and skip the value point of the data to be convolved corresponding to the current data bit and the value point of the weight data when traversing to at least one first flag on the current data bit with the same data bit ordering of the first zero-valued point statistical information and the second zero-valued point statistical information;
and when the current data bits with the same data bit sequence of the first zero-value point statistical information and the second zero-value point statistical information traverse to the second mark, acquiring a non-zero value point of the data to be convolved and a non-zero value point of the weight data corresponding to the current data bits.
Optionally, the obtaining module 501 is configured to determine the number of multiply-add units of the hardware accelerator;
and determining the plurality of feature map data and the plurality of weight data according to the number of the multiplication and addition units, wherein each feature map data is feature map matrix data, each weight data is weight matrix data, and the number of the grouped data is the same as the number of the multiplication and addition units of the hardware accelerator.
Optionally, the data to be convolved is matrix data to be convolved, and the preprocessing module 504 is configured to control the multiply-add unit to traverse the numerical points of the feature map matrix data, and construct matrix data to be convolved, which is the same as the number of rows and columns of the weight matrix data, for the traversed current numerical points.
Optionally, the preprocessing module 504 is further configured to, if the current value point is used as a matrix center, intercept, from the feature map matrix data, a complete matrix with the same number of rows and columns as the weight matrix data, and then use the intercepted complete matrix as the matrix data to be convolved;
and if the current numerical value point is used as a matrix center, and a complete matrix with the same number of rows and columns as the weight matrix data cannot be intercepted from the characteristic map matrix data, supplementing the intercepted incomplete matrix completely to obtain the matrix data to be convolved.
Optionally, the obtaining module 501 is further configured to obtain the plurality of feature map data according to the input image;
the plurality of weight data is obtained from an output channel of the convolutional layer.
The data convolution processing apparatus 500 provided in this embodiment can implement the corresponding process of the data convolution processing method shown in embodiment 1, and is not described herein again to avoid redundancy.
The data convolution processing device provided by the embodiment acquires a plurality of feature map data and a plurality of weight data; dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data; inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator; controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation; and controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data. Therefore, invalid calculation is removed in the convolution operation process, when at least one zero value appears in the weight data or the characteristic diagram data during the multiplication operation, the multiplication operation is skipped, the next multiplication operation is carried out, the multiplication operation containing the zero value can be removed in the convolution processing process, the calculation quantity is reduced, the operation speed is accelerated, and the performance is improved.
Example 3
Furthermore, an embodiment of the present disclosure provides a computer device, including a memory and a processor, where the memory stores a computer program, and the computer program, when running on the processor, executes the data convolution processing method provided in the above method embodiment 1.
The computer device provided in this embodiment may implement the corresponding flow of the data convolution processing method shown in embodiment 1, and is not described herein again to avoid repetition.
The computer device provided by the embodiment acquires a plurality of feature map data and a plurality of weight data; dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data; inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator; controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation; and controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data. Therefore, invalid calculation is removed in the convolution operation process, when at least one zero value appears in the weight data or the characteristic diagram data during the multiplication operation, the multiplication operation is skipped, the next multiplication operation is carried out, the multiplication operation containing the zero value can be removed in the convolution processing process, the calculation quantity is reduced, the operation speed is accelerated, and the performance is improved.
Example 4
The present application also provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the data convolution processing method provided in embodiment 1.
The computer-readable storage medium provided in this embodiment may implement the corresponding flow of the data convolution processing method shown in embodiment 1, and is not described herein again to avoid repetition.
In this embodiment, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
In this embodiment, the computer-readable storage medium may be implemented by the data convolution processing method shown in embodiment 1, and is not described herein again to avoid repetition.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method of data convolution processing, the method comprising:
acquiring a plurality of feature map data and a plurality of weight data;
dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data, wherein the grouped data comprises one feature map data and one corresponding weight data;
inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator;
controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation;
and controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data.
2. The method according to claim 1, wherein the controlling the multiply-add unit to skip zero-valued points of the data to be convolved and zero-valued points of weight data of the packet data and perform convolution operations on the non-zero-valued points of the data to be convolved and the non-zero-valued points of the weight data of the packet data comprises:
acquiring first zero-value point statistical information and second zero-value point statistical information, wherein each data bit of the first zero-value point statistical information is provided with a mark for judging whether each numerical point of the data to be convolved is zero, and each data bit of the second zero-value point statistical information is provided with a mark for judging whether each numerical point of the weight data of the grouped data is zero;
controlling the multiplication and addition unit to acquire a non-zero value point of the data to be convolved and a non-zero value point of the weight data according to the marks corresponding to the data bits of the first zero value point statistical information and the second zero value point statistical information;
and controlling the multiplication and addition unit to carry out convolution operation on the obtained non-zero value points of the data to be convolved and the non-zero value points of the weight data.
3. The method according to claim 2, wherein the flags include a first flag indicating a zero-valued point and a second flag indicating a non-zero-valued point, and the controlling the multiply-add unit to obtain the non-zero-valued point of the data to be convolved and the non-zero-valued point of the weight data according to the flags corresponding to the data bits of the first zero-valued point statistical information and the second zero-valued point statistical information includes:
controlling the multiplication and addition unit to traverse the marks of each data bit according to the data bit sequence of the first zero-value statistical information and the second zero-value statistical information, and skipping the numerical point of the data to be convolved and the numerical point of the weight data corresponding to the current data bit when traversing to at least one first mark on the current data bit with the same data bit sequence of the first zero-value statistical information and the second zero-value statistical information;
and when the current data bits with the same data bit sequence of the first zero-value point statistical information and the second zero-value point statistical information traverse to the second mark, acquiring a non-zero value point of the data to be convolved and a non-zero value point of the weight data corresponding to the current data bits.
4. The method of claim 1, wherein obtaining the plurality of feature map data and the plurality of weight data comprises:
determining the number of multiply-add units of the hardware accelerator;
and determining the plurality of feature map data and the plurality of weight data according to the number of the multiplication and addition units, wherein each feature map data is feature map matrix data, each weight data is weight matrix data, and the number of the grouped data is the same as the number of the multiplication and addition units of the hardware accelerator.
5. The method according to claim 4, wherein the data to be convolved is matrix data to be convolved, and the controlling the multiply-add unit to preprocess the feature map data of the packet data to obtain the data to be convolved for performing a convolution operation comprises:
and controlling the multiplication and addition unit to traverse the numerical points of the characteristic map matrix data, and constructing matrix data to be convolved, which has the same number as the row and column of the weight matrix data, for the traversed current numerical points.
6. The method of claim 5, wherein constructing the matrix data to be convolved for the traversed current value point, the matrix data to be convolved having the same number of rows and columns as the weight matrix data comprises:
if the current numerical value point is used as a matrix center, when a complete matrix with the same number of rows and columns as the weight matrix data can be intercepted from the characteristic map matrix data, the intercepted complete matrix is used as the matrix data to be convolved;
and if the current numerical value point is used as a matrix center, and a complete matrix with the same number of rows and columns as the weight matrix data cannot be intercepted from the characteristic map matrix data, supplementing the intercepted incomplete matrix completely to obtain the matrix data to be convolved.
7. The method of claim 1, wherein obtaining the plurality of feature map data and the plurality of weight data comprises:
acquiring a plurality of feature map data according to an input image;
the plurality of weight data is obtained from an output channel of the convolutional layer.
8. A data convolution processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a plurality of feature map data and a plurality of weight data;
the dividing module is used for dividing the plurality of feature map data and the plurality of weight data into a plurality of grouped data, wherein the grouped data comprises one feature map data and one corresponding weight data;
the input module is used for inputting each grouped data into a multiplication and addition unit corresponding to the hardware accelerator;
the preprocessing module is used for controlling the multiplication and addition unit to preprocess the characteristic diagram data of the grouped data to obtain data to be convolved for executing convolution operation;
and the control module is used for controlling the multiplication and addition unit to skip the zero value point of the data to be convolved and the zero value point of the weight data of the grouped data, and executing convolution operation on the non-zero value point of the data to be convolved and the non-zero value point of the weight data of the grouped data.
9. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, performs the data convolution processing method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the data convolution processing method of any one of claims 1 to 7.
CN202110848542.4A 2021-07-27 2021-07-27 Data convolution processing method and device and computer equipment Pending CN113487017A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110848542.4A CN113487017A (en) 2021-07-27 2021-07-27 Data convolution processing method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110848542.4A CN113487017A (en) 2021-07-27 2021-07-27 Data convolution processing method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN113487017A true CN113487017A (en) 2021-10-08

Family

ID=77943757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110848542.4A Pending CN113487017A (en) 2021-07-27 2021-07-27 Data convolution processing method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN113487017A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN110222835A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of convolutional neural networks hardware system and operation method based on zero value detection
US20200192726A1 (en) * 2018-12-12 2020-06-18 Samsung Electronics Co., Ltd. Method and apparatus for load balancing in neural network
KR20200072308A (en) * 2018-12-12 2020-06-22 삼성전자주식회사 Method and apparatus for performing convolution operations in neural networks
US20200233803A1 (en) * 2020-03-26 2020-07-23 Intel Corporation Efficient hardware architecture for accelerating grouped convolutions
CN111445012A (en) * 2020-04-28 2020-07-24 南京大学 FPGA-based packet convolution hardware accelerator and method thereof
CN112465110A (en) * 2020-11-16 2021-03-09 中国电子科技集团公司第五十二研究所 Hardware accelerator for convolution neural network calculation optimization
CN112766453A (en) * 2019-10-21 2021-05-07 华为技术有限公司 Data processing device and data processing method
CN112840356A (en) * 2018-10-09 2021-05-25 华为技术有限公司 Operation accelerator, processing method and related equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN112840356A (en) * 2018-10-09 2021-05-25 华为技术有限公司 Operation accelerator, processing method and related equipment
US20200192726A1 (en) * 2018-12-12 2020-06-18 Samsung Electronics Co., Ltd. Method and apparatus for load balancing in neural network
KR20200072308A (en) * 2018-12-12 2020-06-22 삼성전자주식회사 Method and apparatus for performing convolution operations in neural networks
CN110222835A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of convolutional neural networks hardware system and operation method based on zero value detection
CN112766453A (en) * 2019-10-21 2021-05-07 华为技术有限公司 Data processing device and data processing method
US20200233803A1 (en) * 2020-03-26 2020-07-23 Intel Corporation Efficient hardware architecture for accelerating grouped convolutions
CN111445012A (en) * 2020-04-28 2020-07-24 南京大学 FPGA-based packet convolution hardware accelerator and method thereof
CN112465110A (en) * 2020-11-16 2021-03-09 中国电子科技集团公司第五十二研究所 Hardware accelerator for convolution neural network calculation optimization

Similar Documents

Publication Publication Date Title
US10296829B2 (en) Convolution processing apparatus and method
CN107807982B (en) Consistency checking method and device for heterogeneous database
US20220292163A1 (en) Dilated convolution using systolic array
Hofri Probabilistic analysis of algorithms: on computing methodologies for computer algorithms performance evaluation
CN111340201A (en) Convolutional neural network accelerator and method for performing convolutional operation thereof
CN113313247B (en) Operation method of sparse neural network based on data flow architecture
CN110580522A (en) Convolution calculation method and related equipment
CN115186802A (en) Block sparse method and device based on convolutional neural network and processing unit
CN103177414A (en) Structure-based dependency graph node similarity concurrent computation method
CN106802785A (en) A kind of stack analysis method and device
CN112085644A (en) Multi-column data sorting method and device, readable storage medium and electronic equipment
CN114676040A (en) Test coverage verification method and device and storage medium
CN113641952A (en) Convolution device, convolution method, matrix disaggregation device and matrix disaggregation method
CN113869495B (en) Method, device, equipment and readable medium for optimizing convolutional weight layout of neural network
CN111191778A (en) Deep learning network processing method, device and compiler
CN114897151A (en) Access optimization method and device, electronic equipment and storage medium
CN110276404A (en) Model training method, device and storage medium
CN117473949A (en) Form dynamic layout method and system
CN113487017A (en) Data convolution processing method and device and computer equipment
CN110955380B (en) Access data generation method, storage medium, computer device and apparatus
CN110766133B (en) Data processing method, device, equipment and storage medium in embedded equipment
CN110032445B (en) Big data aggregation calculation method and device
US20230070730A1 (en) Word based channels last ordering in memory
CN116304507A (en) Convolution result acquisition method and device, electronic equipment and storage medium
CN110874246A (en) Module loading method, system and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211008