CN111291317B - Approximate matrix convolution neural network binary greedy recursion method - Google Patents

Approximate matrix convolution neural network binary greedy recursion method Download PDF

Info

Publication number
CN111291317B
CN111291317B CN202010120117.9A CN202010120117A CN111291317B CN 111291317 B CN111291317 B CN 111291317B CN 202010120117 A CN202010120117 A CN 202010120117A CN 111291317 B CN111291317 B CN 111291317B
Authority
CN
China
Prior art keywords
matrix
binarization
recursive
greedy
binary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010120117.9A
Other languages
Chinese (zh)
Other versions
CN111291317A (en
Inventor
王怡清
史小宏
易典
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN202010120117.9A priority Critical patent/CN111291317B/en
Publication of CN111291317A publication Critical patent/CN111291317A/en
Application granted granted Critical
Publication of CN111291317B publication Critical patent/CN111291317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a convolution neural network binarization greedy recursion method of an approximate matrix, which comprises the following steps: presetting the number of key parameters and linear combined binary matrixes; performing recursion calculation once according to the key parameters by adopting a greedy recursion algorithm to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used by the next recursion calculation; judging whether the recursive computation times are greater than the number of the binary matrixes of the linear combination; if so, performing recursive calculation once according to the error value and the key parameters by adopting a greedy recursive algorithm to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used in the next recursive calculation; and if not, outputting the stored binarization matrix set comprising all the binarization matrices and the scaling factor set comprising all the scaling factors. The invention simplifies the calculation of the neural network binarization algorithm.

Description

Approximate matrix convolution neural network binary greedy recursion method
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a convolution neural network binarization greedy recursion method of an approximate matrix.
Background
In recent years, deep learning has gained more and more attention, and the deep nonlinear structure of the deep nonlinear structure makes a major breakthrough in the fields of image, voice and natural voice processing, but also needs huge time and memory to train and store a network model. Therefore, model compression and training acceleration become a great challenge in deep learning. To solve this problem, the neural network binarization algorithm is considered as a useful model compression solution.
The neural network binarization algorithm is a training quantization algorithm. In the training process, the parameter weight space of the constraint network is two values. The neural network binarization can not only reduce the memory consumption of the model, but also accelerate the reasoning speed of the model. Compared with a single-precision weight, the binary weight only needs a space of 1 bit to store one parameter, and the parameter is directly compressed by 32 times. If the activation values of some layers are constrained to be two values, the multiplication operation in the network can be replaced by an exclusive OR operation, for example, the exclusive OR operation in the convolution operation can be improved by 58 times compared with the single-precision floating point number convolution operation, and the speed in inference is greatly improved.
However, the loss of accuracy of the binarization network is very serious compared to the quantization algorithm after training. Especially after the activation value is binarized, the information contained in the obtained feature vector is lost exponentially. The establishment of a neural network binarization algorithm with high accuracy and small calculation amount is imminent.
Disclosure of Invention
The invention aims to provide a convolution neural network binarization greedy recursion method of an approximate matrix, which realizes the purposes of simplifying calculation of a neural network binarization algorithm, reducing operation, relieving binarization and losing characteristic map information by convolution.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
a convolution neural network binarization greedy recursion method of an approximate matrix comprises the following steps:
s1, presetting the number of key parameters and linear combined binary matrixes;
s2, performing recursive calculation once according to the key parameters by adopting a greedy recursive algorithm to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used by the next recursive calculation;
s3, judging whether the recursion calculation times are larger than the number of the binaryzation matrixes of the linear combination; if yes, entering step S4; if not, the step S5 is carried out;
s4, performing recursive calculation once according to the error value obtained in the step S2 and the key parameter by adopting a greedy recursive algorithm to obtain and store a binary matrix and a scaling factor corresponding to the binary matrix, and calculating an error value used in the next recursive calculation;
and S5, outputting the stored binarization matrix set comprising all the binarization matrices and the scaling factor set comprising all the scaling factors.
Preferably, the key parameter includes a feature map or a network weight.
Preferably, the binarization matrix is calculated by using the following formula:
Figure BDA0002392703760000021
the scaling factor corresponding to the binarization matrix is calculated by adopting the following formula:
Figure BDA0002392703760000022
in the formula, B i Representing the binarized matrix, ε, obtained by the i-th recursive computation i-1 Representing the error value used for the (i-1) th recursive calculation,
Figure BDA0002392703760000023
is a constant, | ε i-1 || l1 In | | | charging l1 Expressed as l1 norm, which refers to the sum of absolute values of each element in the vector; m is equal to the number of binarization matrices of the linear combination. Preferably, the obtained binarization matrix is multiplied by a scaling factor corresponding to the binarization matrix to obtain the error value.
In another aspect, an electronic device is characterized by comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the method as described above.
In yet another aspect, a readable storage medium is characterized in that the readable storage medium has stored therein a computer program, which when executed by a processor, implements a method as described above.
The invention provides a convolution neural network binarization greedy recursion method of an approximate matrix, which comprises the following steps: s1, presetting the number of key parameters and linear combined binary matrixes; s2, performing recursive calculation once according to the key parameters by adopting a greedy recursive algorithm to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used by the next recursive calculation; s3, judging whether the recursive computation times are larger than the number of the binary matrixes of the linear combination; if yes, entering step S4; if not, the step S5 is carried out; s4, carrying out recursive calculation once by adopting a greedy recursive algorithm according to the error value obtained in the step S2 and the key parameter to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used in the next recursive calculation; and S5, outputting the stored binarization matrix set comprising all the binarization matrices and the scaling factor set comprising all the scaling factors. Therefore, the approximation matrix means that the weight and the activation value of the convolutional neural network are both approximated by a binary matrix, so that the multiplication operation in the convolution operation can be omitted. Compared with the method of directly using sign function to directly binarize the weight and the activation value, the method of using the approximate matrix can relieve the loss of binarization and convolution on the characteristic diagram information. However, compared with the method of calculating the approximate matrix by using the least square method, the error between the real-valued matrix and the approximate matrix is increased, so that the calculation can be simplified by using the greedy recursion idea. Therefore, the method can reduce operation and simultaneously relieve binaryzation and convolution from losing the characteristic diagram information.
The invention decomposes the real value matrix into a plurality of binary matrices for convolution, thereby effectively reducing errors after convolution. The effect of decomposing the real-valued matrix by adopting the binary greedy recursion algorithm is far better than that of decomposing by a least square method. When a greedy recursion method (a binary greedy recursion algorithm) is used, the error is reduced by about 80% when 1 binary matrix is added, and when the number of the binary matrices decomposed into real matrices is 4, the error is reduced to half of that of the least square method.
Although a plurality of binary matrices can well reduce errors generated when a real-valued matrix is approximated, the more convolution matrix decomposition, the more time and expense required for convolution calculation is multiplied. In training, for a network with a large model structure, the number of convolutions of each layer is tens of thousands, and the time overhead of the method for solving the scaling factor coefficient of the binary matrix by using the least square method is also quite large. However, the greedy recursion algorithm is used for solving the binary matrix and the scaling factor, so that the calculated amount can be well reduced, and meanwhile, the error can be well reduced.
Drawings
Fig. 1 is a flowchart of a greedy binary recursion method for a convolutional neural network with an approximate matrix according to an embodiment of the present invention;
fig. 2a is a convolution error average value in a convolution error simulation diagram after weight matrix decomposition according to an embodiment of the present invention;
FIG. 2b is a convolution standard error in the simulated convolution error graph after the weight matrix decomposition according to an embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following describes in detail a convolutional neural network binarization greedy recursion method of an approximation matrix according to the present invention with reference to fig. 1 to 3 and a specific embodiment. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise scale for the purpose of facilitating and distinctly aiding in the description of the embodiments of the present invention. To make the objects, features and advantages of the present invention comprehensible, reference is made to the accompanying drawings. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, the greedy binary recursion method for a convolutional neural network of an approximate matrix provided in this embodiment includes:
s1, presetting the number of key parameters and linear combined binary matrixes;
s2, performing recursive calculation once according to the key parameters by adopting a greedy recursive algorithm to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used by the next recursive calculation;
s3, judging whether the recursive computation times are larger than the number of the binary matrixes of the linear combination; if yes, entering step S4; if not, the step S5 is carried out;
s4, performing recursive calculation once according to the error value obtained in the step S2 and the key parameter by adopting a greedy recursive algorithm to obtain and store a binary matrix and a scaling factor corresponding to the binary matrix, and calculating an error value used in the next recursive calculation;
and S5, outputting the stored binarization matrix set comprising all the binarization matrices and the scaling factor set comprising all the scaling factors.
Preferably, the key parameter includes a feature map or a network weight.
Preferably, the binarization matrix is calculated by adopting the following formula:
Figure BDA0002392703760000051
the scaling factor corresponding to the binarization matrix is calculated by adopting the following formula:
Figure BDA0002392703760000052
in the formula, B i Representing the binarized matrix, ε, obtained by the ith recursive computation i-1 Representing the error value used for the i-1 st recursive computation,
Figure BDA0002392703760000053
is a constant, | ε i-1 || l1 | | | | non-conducting phosphor in l1 Expressed as l1 norm, which refers to the sum of absolute values of each element in the vector; m is equal to the number of binarized matrices of the linear combination. Preferably, the obtained binarization matrix is multiplied by a scaling factor corresponding to the binarization matrix to obtain the error value.
Specifically, in the calculation in this embodiment, the following inputs (as inputs to the convolutional neural network) are input: a feature map or network weight A, the number m of linear combined binary matrixes; output (output of convolutional neural network): a set of binarization matrices B comprising all of said binarization matrices and a set of scaling factors a comprising all of said scaling factors
Figure BDA0002392703760000054
The following is a detailed explanation of the above implementation.
(1) Inputting a feature map or network weight A, the number m of linearly combined binary matrixes, and using m binary filters B = [ B ] 1 ,B 2 ,...,B m ]∈{-1,+1} w*h*cin*cout And
Figure BDA00023927037600000613
the linear combination approximates a feature map or network weight A, where->
Figure BDA00023927037600000614
Is the error of the approximated matrix from the real value, A = α 1 B 12 B 2 +…+α m B mm-1 (ii) a (w, h, cin, cout) each represents a width of a convolutional layer, a height of a convolutional layer, a number of input channels of a convolutional layer, and a number of output channels of a convolutional layer; specifically, to express generality, we assume that the convolutional layer is a dimension tensor (w, h, cin, cout). Assume a convolution kernel of ^ greater than or equal to>
Figure BDA0002392703760000061
This is the real valued matrix we want to decompose. This is referred to as the convolution kernel in the convolutional layer.
(2) For each group alpha i And B i Respectively solving:
ε 0 =A
A=α 1 B 12 B 2 +…+α m B mm-1
ε i-1 =A-(α 1 B 1 +…+α i-1 B i-1 )
ε i-1 ≈α i B i
(3) For the objective function J (B) i ,α i )=||ε i-1i B i || 2
Figure BDA0002392703760000062
Derivation is carried out;
Figure BDA0002392703760000063
since B = [ B = 1 ,B 2 ,...,B m ]∈{-1,+1} w*h*cin*cout
Figure BDA0002392703760000064
Is a constant, is->
Figure BDA0002392703760000065
Is also a constant due to epsilon i-1 Is a definite value, is->
Figure BDA0002392703760000066
Then->
Figure BDA0002392703760000067
/>
Figure BDA0002392703760000068
So when B is i = +1 time ε i-1 ≥0,B i = -1 time epsilon i-1 If less than 0, then
Figure BDA0002392703760000069
The specific meaning of the Sign function is that when x is more than or equal to 0, sign (x) =1; when x < 0, sign (x) = -1.
Figure BDA00023927037600000610
Will be/are>
Figure BDA00023927037600000611
By sign (ε) i-1 ) Instead of get->
Figure BDA00023927037600000612
As shown in fig. 2a and 2b, which respectively show the convolution error simulation diagram after weight matrix decomposition of the present embodiment. As can be seen, M in the figure represents the number of real-valued matrices decomposed into binary matrices. The mean values of the convolution input random data are 0 and the variance is 1, respectively. The abscissa axis is the variance of the weight and the ordinate axis is the average of the error. IBNN is a model using the present invention.
The dotted line is a zoom factor set of a binary matrix solved by adopting an ABC Net approximation method through a least square method; the solid line shows the solving of the binary matrix set and the scaling factor set by the greedy recursion method herein. Observing the upper graph curve, it can be seen that the larger the variance of the real-valued convolution kernel, the larger the error and variance of the error of the binary convolution output and the real-valued convolution output. The experiment uses 4 different number of binary matrix numbers, where m =0 is mapped directly to binary by sign function. The quantity of the decomposed binary matrix is in positive correlation with the error. Therefore, the error after convolution can be effectively reduced by decomposing the real-value matrix into a plurality of binary matrices and performing convolution. Meanwhile, it can also be observed that the effect of decomposing the real-valued matrix by adopting the greedy recursion algorithm is far better than that of decomposing the real-valued matrix by using the least square method in ABC Net. The error is reduced by about 80% by adding 1 binary matrix by using a greedy recursion method, and the error is reduced to half of the least square method when m = 4.
Specifically, the main background of the problem to be solved in this embodiment is for convolutional neural network model compression. The amount of memory and computation required will also increase for each deeper layer of the (convolutional) neural network, and it becomes impractical to train the network using a conventional single-machine cpu. Therefore, on the embedded small terminal, the model compression quantization is an important problem. The neural network binarization algorithm is a training quantization algorithm. In the prior art, a plurality of binary matrices are used to replace a real-valued matrix to realize the binarization of the neural network, and a least square method is used when the plurality of binary matrices are solved, but the calculation of the least square method is relatively complex, so that the embodiment proposes that a greedy recursion algorithm is used to solve the plurality of binary matrices to simplify the calculation. The specific effects are seen from experiments: with continued reference to fig. 2a and 2b, ibnn (solid line) is the model provided in this embodiment, and ABC Net (dashed line) is a model in the prior art, which is a model using the least squares method. The line with m =0 is directly binarized by sign function, and m represents the number of divided binary matrices. It can be seen from fig. 2a and 2b that the solid line of the same line with m =0 is under the dotted line, so the present embodiment provides a smaller error for the IBNN model. From the overall view, the error of dividing a real-valued matrix into a plurality of binary matrices is smaller than that of directly using a sign function to change the whole real-valued matrix into the binary matrices.
The embodiment uses m binary filters by inputting a feature map or network weight A, the number m of linear combined binary matrixes
Figure BDA0002392703760000071
The linear combination approximates a feature map or network weight A, where->
Figure BDA0002392703760000072
Is the error of the approximated matrix from the real value, A = α 1 B 12 B 2 +…+α m B mm-1 (ii) a Here, a real-valued matrix is decomposed into a plurality of binary matrices B and coefficients α i . But at this point each B and alpha is unknown i The specific value is, however, at this time, the real-valued matrix (feature map or network weight) A is known, so that each B and alpha can be solved by using the real-valued matrix A i . The prior art is to solve for B and alpha using least squares i . The present embodiment uses greedy recursion to solve.
From the above, the effect of decomposing the real-valued matrix by using the greedy binarization recursive algorithm is far better than that of least square decomposition.
In summary, the present invention provides a greedy recursive method for binary operation of a convolutional neural network with an approximate matrix, which includes: s1, presetting the number of key parameters and linear combined binary matrixes; s2, performing recursive calculation for one time according to the key parameters by adopting a greedy recursive algorithm to obtain and store a binary matrix and a scaling factor corresponding to the binary matrix, and calculating an error value used in the next recursive calculation; s3, judging whether the recursive computation times are larger than the number of the binary matrixes of the linear combination; if yes, entering step S4; if not, the step S5 is carried out; s4, carrying out recursive calculation once by adopting a greedy recursive algorithm according to the error value obtained in the step S2 and the key parameter to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used in the next recursive calculation; and S5, outputting the stored binarization matrix set comprising all the binarization matrices and the scaling factor set comprising all the scaling factors. Therefore, the approximation matrix means that the weight and the activation value of the convolutional neural network are both approximated by a binary matrix, so that the multiplication operation in the convolution operation can be omitted. Compared with the method of directly using sign function to directly binarize the weight and the activation value, the method of using the approximate matrix can relieve the loss of binarization and convolution on the characteristic diagram information. However, compared with the method of calculating the approximate matrix by using the least square method, the error between the real-valued matrix and the approximate matrix is increased, so that the calculation can be simplified by using the greedy recursion idea. Therefore, the method can reduce operation and simultaneously relieve binaryzation and convolution to the loss of the characteristic diagram information.
The invention decomposes the real value matrix into a plurality of binary matrices for convolution, thereby effectively reducing errors after convolution. The effect of decomposing the real-valued matrix by adopting the binary greedy recursion algorithm is far better than that of decomposing by a least square method. When a greedy recursion method (a binary greedy recursion algorithm) is used, the error is reduced by about 80% when 1 binary matrix is added, and when the number of the binary matrices decomposed into real matrices is 4, the error is reduced to half of that of the least square method.
Although a plurality of binary matrices can well reduce errors generated when a real-valued matrix is approximated, the more convolution matrix decomposition, the more time and expense required for convolution calculation is multiplied. In training, for a network with a large model structure, the number of convolutions of each layer is tens of thousands, and the time overhead of the method for solving the scaling factor coefficient of the binary matrix by using the least square method is also quite large. However, the greedy recursion algorithm is used for solving the binary matrix and the scaling factor, so that the calculated amount can be well reduced, and meanwhile, the error can be well reduced. In still another aspect, based on the same inventive concept, the present invention further provides an electronic device, as shown in fig. 3, where the electronic device includes a processor 301 and a memory 303, and the memory 303 stores a computer program, and when the computer program is executed by the processor 301, the greedy binarization recursive method for a convolutional neural network that approximates a matrix as described above is implemented.
The electronic device provided by the embodiment can realize the purposes of simplifying the calculation of the neural network binarization algorithm, reducing the calculation, and relieving the binarization and convolution loss of the characteristic diagram information.
With continued reference to fig. 3, the electronic device further comprises a communication interface 302 and a communication bus 304, wherein the processor 301, the communication interface 302 and the memory 303 are communicated with each other through the communication bus 304. The communication bus 304 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 304 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface 302 is used for communication between the electronic device and other devices.
The Processor 301 in this embodiment may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and so on. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 301 is the control center of the electronic device and is connected to various parts of the whole electronic device by various interfaces and lines.
The memory 303 may be used for storing the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling data stored in the memory 303.
The memory 303 may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In other aspects, based on the same inventive concept, the present invention further provides a readable storage medium, in which a computer program is stored, and the computer program, when being executed by a processor, can implement the convolutional neural network binarization greedy recursion method for approximation matrix as described above.
The readable storage medium provided by the embodiment can realize the purposes of simplifying the calculation of the neural network binarization algorithm, reducing the operation, and simultaneously relieving the binarization and convolution to lose the characteristic map information.
The readable storage medium provided by this embodiment may take any combination of one or more computer-readable media. The readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this context, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
In this embodiment, computer program code for carrying out operations for embodiments may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It should be noted that the apparatuses and methods disclosed in the embodiments herein can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments herein. In this regard, each block in the flowchart or block diagrams may represent a module, a program, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments herein may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (6)

1. A convolution neural network binary greedy recursion method of an approximate matrix is characterized by comprising the following steps:
s1, presetting the number of key parameters and linear combined binary matrixes;
s2, performing recursive calculation once according to the key parameters by adopting a greedy recursive algorithm to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used by the next recursive calculation;
s3, judging whether the recursive computation times are larger than the number of the binary matrixes of the linear combination; if yes, entering step S4; if not, the step S5 is carried out;
s4, carrying out recursive calculation once by adopting a greedy recursive algorithm according to the error value obtained in the step S2 and the key parameter to obtain and store a binarization matrix and a scaling factor corresponding to the binarization matrix, and calculating an error value used in the next recursive calculation;
and S5, outputting the stored binarization matrix set comprising all the binarization matrices and the scaling factor set comprising all the scaling factors.
2. The method of claim 1, wherein the key parameters comprise a feature map or network weights.
3. The convolutional neural network binarization greedy recursion method of approximation matrix as claimed in claim 2, wherein said binarization matrix is calculated using the following formula:
Figure FDA0002392703750000011
the scaling factor corresponding to the binarization matrix is calculated by adopting the following formula:
Figure FDA0002392703750000012
in the formula, B i Representing the binarized matrix, ε, obtained by the i-th recursive computation i-1 Representing the error value used for the i-1 st recursive computation,
Figure FDA0002392703750000013
is a constant, | ε i-1 || l1 | | | | non-conducting phosphor in l1 Expressed as l1 norm, which refers to the sum of absolute values of each element in the vector; m is equal to the number of binarization matrices of the linear combination.
4. The convolutional neural network binarization greedy recursion method for approximation matrix of claim 3, wherein the obtained binarization matrix is multiplied by a scaling factor corresponding to the binarization matrix to obtain the error value.
5. An electronic device comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the method of any of claims 1 to 4.
6. A readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 4.
CN202010120117.9A 2020-02-26 2020-02-26 Approximate matrix convolution neural network binary greedy recursion method Active CN111291317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010120117.9A CN111291317B (en) 2020-02-26 2020-02-26 Approximate matrix convolution neural network binary greedy recursion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010120117.9A CN111291317B (en) 2020-02-26 2020-02-26 Approximate matrix convolution neural network binary greedy recursion method

Publications (2)

Publication Number Publication Date
CN111291317A CN111291317A (en) 2020-06-16
CN111291317B true CN111291317B (en) 2023-03-24

Family

ID=71017996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010120117.9A Active CN111291317B (en) 2020-02-26 2020-02-26 Approximate matrix convolution neural network binary greedy recursion method

Country Status (1)

Country Link
CN (1) CN111291317B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
CN108334945A (en) * 2018-01-30 2018-07-27 中国科学院自动化研究所 The acceleration of deep neural network and compression method and device
CN110313179A (en) * 2017-01-31 2019-10-08 夏普株式会社 System and method for bi-directional scaling transformation coefficient level value
CN110751265A (en) * 2019-09-24 2020-02-04 中国科学院深圳先进技术研究院 Lightweight neural network construction method and system and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3320683A4 (en) * 2015-07-30 2018-12-05 Zhejiang Dahua Technology Co., Ltd Methods and systems for image compression
US11556775B2 (en) * 2017-10-24 2023-01-17 Baidu Usa Llc Systems and methods for trace norm regularization and faster inference for embedded models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
CN110313179A (en) * 2017-01-31 2019-10-08 夏普株式会社 System and method for bi-directional scaling transformation coefficient level value
CN108334945A (en) * 2018-01-30 2018-07-27 中国科学院自动化研究所 The acceleration of deep neural network and compression method and device
CN110751265A (en) * 2019-09-24 2020-02-04 中国科学院深圳先进技术研究院 Lightweight neural network construction method and system and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卷积神经网络声学模型的结构优化和加速计算;王智超等;《重庆邮电大学学报(自然科学版)》;20180615(第03期);全文 *
基于二值的网络加速;谢佳砼;《电子制作》;20181215(第24期);全文 *

Also Published As

Publication number Publication date
CN111291317A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
Gholami et al. A survey of quantization methods for efficient neural network inference
Zhou et al. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients
US10713818B1 (en) Image compression with recurrent neural networks
CN110520870B (en) Flexible hardware for high throughput vector dequantization with dynamic vector length and codebook size
US20200097828A1 (en) Processing method and accelerating device
US20200117981A1 (en) Data representation for dynamic precision in neural network cores
CN107944545B (en) Computing method and computing device applied to neural network
US10783432B2 (en) Update management for RPU array
CN112508125A (en) Efficient full-integer quantization method of image detection model
CN110728350A (en) Quantification for machine learning models
Chervyakov et al. Residue number system-based solution for reducing the hardware cost of a convolutional neural network
CN115862751B (en) Quantum chemistry calculation method for updating aggregation attention mechanism based on edge features
CN112651485A (en) Method and apparatus for recognizing image and method and apparatus for training neural network
Ko et al. Design and analysis of a neural network inference engine based on adaptive weight compression
Gonugondla et al. Swipe: Enhancing robustness of reram crossbars for in-memory computing
Zhang et al. A practical highly paralleled ReRAM-based DNN accelerator by reusing weight pattern repetitions
US20100174859A1 (en) High capacity content addressable memory
CN111291317B (en) Approximate matrix convolution neural network binary greedy recursion method
Jiang et al. A low-latency LSTM accelerator using balanced sparsity based on FPGA
CN112561050B (en) Neural network model training method and device
US20220245433A1 (en) Sparse convolutional neural network
Kang et al. Weight partitioning for dynamic fixed-point neuromorphic computing systems
US11854558B2 (en) System and method for training a transformer-in-transformer-based neural network model for audio data
Kiyama et al. Deep learning framework with arbitrary numerical precision
CN113988279A (en) Output current reading method and system of storage array supporting negative value excitation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant