WO2017149722A1 - Dispositif informatique et procédé de calcul - Google Patents

Dispositif informatique et procédé de calcul Download PDF

Info

Publication number
WO2017149722A1
WO2017149722A1 PCT/JP2016/056583 JP2016056583W WO2017149722A1 WO 2017149722 A1 WO2017149722 A1 WO 2017149722A1 JP 2016056583 W JP2016056583 W JP 2016056583W WO 2017149722 A1 WO2017149722 A1 WO 2017149722A1
Authority
WO
WIPO (PCT)
Prior art keywords
scalar quantization
unit
scalar
value
layer
Prior art date
Application number
PCT/JP2016/056583
Other languages
English (en)
Japanese (ja)
Inventor
知宏 成田
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2016/056583 priority Critical patent/WO2017149722A1/fr
Publication of WO2017149722A1 publication Critical patent/WO2017149722A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to a calculation device and a calculation method for calculating an output value of a multilayer neural network (DNN).
  • DNN multilayer neural network
  • DNN-HMM has been proposed in which acoustic likelihood calculation based on HMM (Hidden Markov Model) is performed by DNN.
  • HMM Hidden Markov Model
  • This DNN-HMM has a significantly improved performance compared to a conventional GMM-HMM that performs acoustic likelihood calculation using a GMM (Gaussian Mixture Model) (see, for example, Non-Patent Document 1).
  • a filter bank feature quantity and a vector obtained by connecting a plurality of dynamic features of the filter bank feature quantity are input to the DNN input layer, and the output value of the output layer is normalized by the prior probability calculated from the number of learning data. And the acoustic likelihood is calculated.
  • DNN-HMM is generally expected to have higher recognition performance as the number of hidden layer units increases.
  • the calculation time is proportional to the square of the number of units in the hidden layer. Therefore, when the DNN-HMM is applied to an arithmetic device such as an embedded device that has a limited calculation resource, the number of units cannot be increased and high recognition performance cannot be obtained.
  • Non-Patent Document 2 describes that singular value decomposition is applied to DNN to reduce the number of hidden layer units by low rank approximation. Since the recognition performance by DNN depends on the low rank approximation method and the relearning method after approximation, even if the number of units can be reduced by the method described in Non-Patent Document 2, it is difficult to obtain stable recognition performance. .
  • the present invention has been made to solve the above-described problems, and an object thereof is to improve the recognition performance of a multi-layer neural network mounted on an arithmetic device such as an embedded device that is limited in calculation resources. .
  • An arithmetic device includes a weight storage unit storing a weight matrix for each layer of a multilayer neural network, a weight matrix stored in the weight storage unit, and a vector input to an input layer of the multilayer neural network.
  • a matrix multiplication unit that performs multiplication with a vector output from the previous layer, and a scalar quantization table that represents the correspondence between the quantization range for performing scalar quantization and the quantization value are stored.
  • a scalar quantization table storage unit a scalar quantization unit that scalar-quantizes each dimension value of the vector output from the matrix multiplication unit with reference to the scalar quantization table, and a scalar quantization in the output layer of the multilayer neural network
  • a likelihood calculator that calculates the likelihood vector using the vector output from the unit, and scalar quantization for each layer of the multilayer neural network It is intended and a scalar quantization control section for controlling whether to perform the scalar quantization by.
  • scalar quantization when scalar quantization is performed in a layer of a multilayer neural network, multiplication of a scalar quantized vector and a weight matrix is performed in the next layer, so that scalar quantization is not performed.
  • the amount of calculation can be reduced compared with the case of multiplying a vector and a weight matrix. Therefore, a multi-layer neural network having a large number of units can be mounted even in an arithmetic device with limited calculation resources, and recognition performance can be improved.
  • FIG. 1 It is a block diagram which shows the structural example of the arithmetic unit which concerns on Embodiment 1 of this invention.
  • 2A and 2B are hardware configuration diagrams of the arithmetic device according to the first embodiment.
  • 3 is a flowchart showing the operation of the arithmetic device according to the first embodiment.
  • 6 is a flowchart illustrating an operation of a scalar quantization unit in the arithmetic device according to the first embodiment.
  • 4 is an example of a scalar quantization table stored in a scalar quantization table storage unit in the arithmetic device according to the first embodiment.
  • FIG. 1 is a block diagram showing a configuration example of an arithmetic device 1 according to Embodiment 1 of the present invention.
  • This computing device 1 is a device that performs likelihood computation processing on a feature vector using a multilayer neural network and outputs a likelihood vector.
  • the arithmetic device 1 according to the first embodiment includes a matrix multiplication unit 100, a weight storage unit 101, a scalar quantization control unit 102, a scalar quantization unit 103, a scalar quantization table storage unit 104, and a likelihood.
  • a calculation unit 105 is provided.
  • FIG. 2A is a hardware configuration diagram of the arithmetic device 1 according to the first embodiment.
  • the arithmetic device 1 includes a processor 2 and a memory 3.
  • the functions of the matrix multiplication unit 100, the scalar quantization control unit 102, the scalar quantization unit 103, and the likelihood calculation unit 105 in the arithmetic device 1 are realized by the processor 2 executing a program stored in the memory 3.
  • the processor 2 is also referred to as a CPU (Central Processing Unit), a processing device, a microprocessor, a microcomputer, or a DSP (Digital Signal Processor).
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • the memory 3 includes, for example, RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Programmable EPROM), flash memory, SSD (Solid State Nonvolatile Drive). It may be a semiconductor memory, a magnetic disk such as a hard disk or a flexible disk, or an optical disk such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). Further, the weight storage unit 101 and the scalar quantization table storage unit 104 in the arithmetic device 1 are the memory 3.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • EPROM Erasable Programmable ROM
  • EEPROM Electrically Programmable EPROM
  • flash memory SSD (Solid State Nonvolatile Drive). It may be a semiconductor memory, a magnetic disk such as a hard disk or a flexible disk, or an optical disk such as a CD (Compact Disc) or a DVD (Digital Versatile Disc).
  • the functions of the matrix multiplication unit 100, the scalar quantization control unit 102, the scalar quantization unit 103, and the scalar quantization table storage unit 104 are software. , Firmware, or a combination of software and firmware. Software or firmware is described as a program and stored in the memory 3. The processor 2 implements the functions of the respective units by reading out and executing the program stored in the memory 3.
  • the arithmetic unit 1 when executed by the processor 2, performs a matrix multiplication step for multiplying a weight matrix and a vector, and a scalar quantum for scalar-quantizing each dimension value of the vector output in the matrix multiplication step.
  • Step, likelihood calculation step for calculating likelihood vector using vector output in scalar quantization step in output layer of multilayer neural network, and whether or not to perform scalar quantization for each layer of multilayer neural network A memory 3 for storing a program to be executed. It can also be said that this program causes a computer to execute the procedures or methods of the matrix multiplication unit 100, the scalar quantization control unit 102, the scalar quantization unit 103, and the likelihood calculation unit 105.
  • the arithmetic unit 1 may be realized by a dedicated processing circuit 4.
  • the processing circuit 4 includes, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, (Application-Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), or a combination of these. To do.
  • the functions of the matrix multiplication unit 100, the scalar quantization control unit 102, the scalar quantization unit 103, and the likelihood calculation unit 105 may be realized by a plurality of processing circuits 4, or these functions are combined into one processing circuit 4 It may be realized with.
  • part of the functions of the matrix multiplication unit 100, the scalar quantization control unit 102, the scalar quantization unit 103, and the likelihood calculation unit 105 are realized by the processing circuit 4, and a part is realized by software or firmware. It may be.
  • FIG. 3 is a flowchart showing the operation of the arithmetic device 1 according to the first embodiment.
  • the arithmetic device 1 is used for voice recognition.
  • the present invention is not limited to the use of voice recognition, and is generally applicable to pattern recognition such as image recognition and object recognition from sensor information.
  • step ST101 the matrix multiplication unit 100 initializes the layer number L of the multilayer neural network to 1.
  • DNN-HMM a vector obtained by connecting a plurality of filter bank feature quantities and their dynamic features is generally used as a feature vector.
  • the matrix multiplication unit 100 refers to the input dimension number DIN L of the hierarchy L in the weight storage unit 101 and substitutes it for I, and refers to the output dimension number DOUT L of the hierarchy L in the weight storage unit 101. Substitute for J.
  • the weight storage unit 101 stores a weight matrix w ji of DOUT L ⁇ DIN L for each hierarchy. Note that the number of units in the layer L matches the output dimension number DOUT L.
  • step ST103 the matrix multiplication unit 100 acquires a determination result SQDo (L-1) from the scalar quantization control unit 102 as to whether or not the scalar quantization has been performed in the layer L-1 immediately before the layer L. Then, when the determination result SQDo (L ⁇ 1) is 1 (step ST103 “YES”), that is, when scalar quantization is performed, the matrix multiplication unit 100 performs matrix multiplication with a reduced amount of computation. The process proceeds to ST105, and if not (step ST103 “NO”), the process proceeds to step ST104 to perform normal matrix multiplication.
  • the matrix multiplication unit 100 acquires a weight matrix w ji necessary for the calculation of Expression (1) from the weight storage unit 101.
  • Equation (2) a k is a scalar quantized x i
  • I k is a set of indices i of the scalar i quantized to a k .
  • the matrix multiplication unit 100 acquires a weight matrix w ji necessary for the calculation of Expression (2) from the weight storage unit 101.
  • step ST106 the scalar quantization unit 103 acquires, from the scalar quantization control unit 102, a determination result SQDo (L) for determining whether or not to perform scalar quantization in the hierarchy L. If the determination result SQDo (L) is 1 (step ST106 “YES”), the scalar quantization unit 103 proceeds to step ST107 to perform the scalar quantization, and otherwise (step ST106 “NO”). It progresses to step ST110. Note that whether or not to perform scalar quantization on the layer L is determined in consideration of the trade-off between the degree of reduction in recognition performance and the effect of reducing the amount of computation by performing scalar quantization in advance. Assume that the setting is made for the unit 102. The scalar quantization control unit 102 controls the scalar quantization unit 103 according to the setting.
  • step ST107 the scalar quantization unit 103 performs scalar quantization on the output u j of the matrix multiplication unit 100.
  • FIG. 4 is a flowchart showing the operation of the scalar quantization unit 103 in the arithmetic device 1 according to the first embodiment.
  • step ST107A the scalar quantization unit 103 inputs 1 to the dimension index i of the input vector, and clears the set I k of the scalar quantized index.
  • step ST107B the scalar quantization unit 103 initializes the scalar quantization table index k to 1.
  • step ST107C the scalar quantization unit 103 compares the quantization range lower limit value L k and upper limit value H k stored in the scalar quantization table storage unit 104 with x i . Then, the scalar quantization unit 103, if x i is greater than L k, and x i is equal to or less than H k (step ST107C "YES"), the process proceeds to step ST107D, otherwise (step ST107C "NO" ) Proceeds to step ST107E.
  • FIG. 5 is an example of the scalar quantization table stored in the scalar quantization table storage unit 104.
  • the above-described scalar quantization table index k is an index of the quantization value a k stored in the scalar quantization table of the scalar quantization table storage unit 104.
  • the scalar quantization table size K is the maximum value of the index k of the quantization value stored in the scalar quantization table.
  • the scalar quantization table is determined in advance for each layer in consideration of the trade-off between the degree of reduction in recognition performance and the effect of reducing the amount of calculation.
  • a value after applying the activation function is stored as a quantization value of the scalar quantization table. In the example of FIG. 4, the quantized value after application of the sigmoid function is shown.
  • the scalar quantization table storage unit 104 may store a single scalar quantization table that does not distinguish between hierarchies, or may store a scalar quantization table for each hierarchy.
  • the quantization width and quantization range of scalar quantization affect the final recognition performance, for example, fine quantization is applied to layers that have a large impact on recognition performance due to scalar quantization. It is desirable to perform rough quantization for a layer that has a small effect on the frequency. Therefore, compared to the case of using a single scalar quantization table that does not distinguish between hierarchies, the use of a scalar quantization table prepared for each layer does not degrade the recognition performance, but the overall amount of computation. Can be reduced.
  • step ST107D the scalar quantization unit 103 quantizes x i into a k using the following equations (3) and (4).
  • equation (4) add (I k , i) is a function for adding i to the quantized index set I k .
  • the scalar quantization unit 103 proceeds to step ST107G after step ST107D.
  • x i a k (3) add (I k , i) (4)
  • step ST107E the scalar quantization unit 103 increments the scalar quantization table index k.
  • step ST107G the scalar quantization unit 103 increments the dimension index i of the input vector.
  • step ST107H when the dimension index i of the input vector is equal to or less than the input dimension number I (step ST107H “YES”), the scalar quantization unit 103 returns to step ST107B. On the other hand, if the dimension index i of the input vector is larger than the input dimension number I (step ST107H “NO”), the scalar quantization unit 103 ends the scalar quantization process and proceeds to step ST108 in FIG.
  • step ST108 the scalar quantization unit 103 determines that the scalar quantized value is a value after application of an activation function such as a sigmoid function ("YES" in step ST108), that is, the value x i of each dimension of the input vector is If converted to the quantization value ak of the scalar quantization table, the process proceeds to step ST109.
  • the case where the scalar quantized value is a value after the activation function is applied is a case where the value x i of each dimension of the input vector is converted into the quantized value a k of the scalar quantization table in step ST107. Otherwise (step ST108 “NO”), the scalar quantization unit 103 proceeds to step ST110.
  • step ST109 the scalar quantization unit 103 does not need to apply an activation function, and thus calculates an output z j according to the following equation (5).
  • step ST110 the scalar quantization unit 103 needs to apply an activation function, and thus calculates an output z j according to the following equation (6).
  • f is an activation function.
  • the logistic sigmoid function of Expression (7) or the normalized linear function of Expression (8) is used in the intermediate layer, and Expression (9) of Expression (9) is used in the output layer.
  • a softmax function is used.
  • step ST111 the scalar quantization unit 103 determines whether or not the layer L is an output layer of the multilayer neural network. Then, the scalar quantization unit 103 proceeds to step ST113 when the layer L is the output layer of the multilayer neural network (step ST111 “YES”), and proceeds to step ST112 when it is not the output layer (step ST111 “NO”). .
  • step ST112 the scalar quantization unit 103 outputs the output z j calculated in step ST109 or step ST110 to the matrix multiplication unit 100.
  • the matrix multiplication unit 100 that has received the output z j increments the hierarchy number L, and outputs the output of the hierarchy L ⁇ 1 to the value xj of each dimension of the input vector for the hierarchy L after the increment using the following equation (10). Substitute values z j for each dimension of the vector. Thereafter, the matrix multiplication unit 100 returns to step ST102.
  • step ST113 the scalar quantization unit 103 outputs the output z j calculated in step ST109 or step ST110 to the likelihood calculating unit 105.
  • the likelihood calculating unit 105 that has received the output z j uses the following expression (11) to calculate the likelihood p (v
  • j from the values z j (i 1 to J) of the output vectors of the output layer. ) Is calculated.
  • p 0 (j) is the probability of prior distribution calculated from the number of learning data.
  • j) z j / p 0 (j) (11)
  • the arithmetic device 1 includes the weight storage unit 101 storing the weight matrix for each layer of the multilayer neural network, the weight matrix stored in the weight storage unit 101, and the multilayer neural network.
  • a matrix multiplication unit 100 that performs multiplication with a vector input to the input layer of the network or a vector output from the previous layer, and a correspondence relationship between a quantization range for performing scalar quantization and its quantization value.
  • a scalar quantization table storage unit 104 that stores the scalar quantization table, and a scalar quantization unit that performs scalar quantization on values of each dimension of the vector output from the matrix multiplication unit 100 with reference to the scalar quantization table 103 and a likelihood calculation for calculating a likelihood vector using the vector output from the scalar quantization unit 103 in the output layer of the multilayer neural network
  • And parts 105 each layer of a multilayer neural network, a structure and a scalar quantization controller 102 for controlling whether to perform the scalar quantization by the scalar quantizer 103.
  • the matrix multiplication unit 100 applies the activation function in the layer L.
  • the number of multiplications caused by vector operations can be reduced to K / I. Therefore, a multi-layer neural network having a large number of units can be mounted even in the arithmetic device 1 having a limited calculation resource such as an embedded device, and recognition performance can be improved.
  • the scalar quantization control unit 102 takes into account the tradeoff between the degree of reduction in recognition performance due to scalar quantization and the effect of reducing the amount of computation, controlling whether or not scalar quantization is performed for each layer of the multilayer neural network, It becomes easy to satisfy both the recognition performance and the computational resources required for the arithmetic device 1.
  • the scalar quantization unit 103 performs scalar quantization to a value after applying the activation function when performing scalar quantization on each dimension value of the vector output from the matrix multiplication unit 100. It is a configuration. With this configuration, the activation function application processing itself can be reduced, and a multi-layer neural network having a large number of units can be mounted even in the arithmetic device 1 having a limited calculation resource such as an embedded device.
  • the scalar quantization table storage unit 104 stores a scalar quantization table for each layer of the multilayer neural network, and the scalar quantization unit 103 is stored in the scalar quantization table storage unit 104.
  • the scalar quantization table corresponding to the layer that performs the scalar quantization is referred to. With this configuration, it is possible to reduce the overall calculation amount without degrading the recognition performance as compared with the case where a single scalar quantization table is used.
  • Embodiment 2 When the quantized value becomes 0 in the scalar quantization unit 103 of the first embodiment, the subsequent multiplication result by the matrix multiplying unit 100 is also 0, so that the multiplication itself is unnecessary. Therefore, in the second embodiment, considering the case where the quantized value becomes 0, the object is to reduce the number of multiplications and the number of additions compared to the first embodiment.
  • FIG. 6 is a graph showing the frequency of the output vector values of each layer of the multilayer neural network counted in increments of 0.01. From this result, it can be seen that the frequency of values close to 0 is high in all layers. Therefore, as will be described below, by appropriately determining the quantization range for scalar quantization to 0, it is possible to reduce the amount of computation without degrading the recognition performance.
  • the configuration of the arithmetic device 1 according to the second embodiment is the same as the configuration of the arithmetic device 1 according to the first embodiment shown in FIG.
  • the difference between the second embodiment and the first embodiment is a calculation formula for matrix multiplication in which the calculation amount in the matrix multiplication unit 100 is reduced.
  • the scalar quantization unit 103 is configured to store the quantization range corresponding to the quantization value 0.
  • the matrix multiplication unit 100 calculates the sum of weights corresponding to the scalar-quantized dimension to a quantized value other than 0 among the dimensions of the scalar-quantized vector output from the previous layer.
  • the weight matrix stored in the weight storage unit 101 is used for calculation, and the total value of the weights is multiplied by a quantization value other than zero.
  • the number of multiplications of Wx matrix ⁇ vector operations occurring in each layer of the multilayer neural network can be reduced to (K ⁇ 1) / I, and the number of additions can be reduced to (I ⁇ cnt (I 0 )) / I.
  • cnt ( ⁇ ) is a function for counting the number of elements. Therefore, a multi-layer neural network having a large number of units can be mounted even in the arithmetic device 1 having a limited calculation resource such as an embedded device.
  • Embodiment 3 FIG.
  • the first embodiment and the second embodiment are mainly intended to reduce the number of multiplications of the Wx matrix ⁇ vector operation occurring in each layer of the multilayer neural network.
  • the third embodiment is different from the first embodiment in the first embodiment. The purpose is to further reduce the number of additions than 2.
  • FIG. 7 is a block diagram illustrating a configuration of the arithmetic device 1 according to the third embodiment. 7, parts that are the same as or correspond to those in FIG. 1 are given the same reference numerals, and descriptions thereof are omitted.
  • the difference between the third embodiment and the first and second embodiments is that a set of indexes scalar-quantized to the same quantized value and a sum of weights corresponding to the indexes belonging to the set.
  • the weight sum total value storage unit 106 is stored. This weight sum value storage unit 106 is, for example, the memory 3 shown in FIG. 2A.
  • FIG. 8 is an example of a set of indexes and a total weight value stored in the total weight value storage unit 106.
  • the total weight value is expressed by the following equation (14).
  • a set I k ⁇ 1, 3, 5,..., 511 ⁇ of dimension indexes i in which the quantized values a k are the same frequently appears.
  • the sum total value of the weights corresponding to the set I k is stored in the weight sum value storage unit 106 as s j (R n ).
  • the respective weights before the sum value is obtained are shown, but the actual weight sum value storage unit 106 stores the sum value calculated from each weight.
  • the arithmetic device 1 for each layer of the multi-layer neural network, weights corresponding to a set of vector dimensions scalar-quantized to the same quantized value and dimensions belonging to the set. It is the structure provided with the weight sum total value memory
  • the number of additions of Wx matrix ⁇ vector operations occurring in each layer of the multilayer neural network can be reduced. Therefore, a multi-layer neural network having a large number of units can be mounted even in the arithmetic device 1 having a limited calculation resource such as an embedded device.
  • Embodiment 4 FIG.
  • the weight sum total storage unit 106 according to the third embodiment is configured to store in advance the sum total s j (R n ) of the weights corresponding to the frequent index set R n , but in the fourth embodiment, A configuration is adopted in which a set of indexes R n based on the result of scalar quantization calculated in the past and a total weight value s j (R n ) corresponding to the set are stored.
  • FIG. 9 is a block diagram illustrating a configuration of the arithmetic device 1 according to the fourth embodiment. 9, parts that are the same as or correspond to those in FIG. 3 are given the same reference numerals, and descriptions thereof are omitted.
  • the difference between the fourth embodiment and the third embodiment corresponds to the set of indexes scalar-quantized to the same quantization value by the scalar quantization unit 103 and the indexes belonging to the set.
  • the weight sum value is stored in the weight sum value storage unit 106.
  • R n and s j (R n ) can be stored in the weight sum total storage unit 106 as a cache.
  • the arithmetic unit 1 stores R n and s j (R n ) obtained when the utterance at a certain time in the past is recognized as speech in the weight sum storage unit 106 and recognizes the subsequent utterance as speech.
  • the latest R n and s j (R n ) are sequentially replaced with the least referenced R n and s j (R n ) in the weight sum total storage unit 106.
  • the number of additions can be reduced by using the value stored in the weight summation value storage unit 106, so that the effect of reducing the amount of computation can be expected to increase.
  • the weight sum storage unit 106 performs the weights corresponding to the sets of dimensions of vectors that have been previously quantized to the same quantized value and the dimensions belonging to the set. Is stored.
  • the number of Wx matrix ⁇ vector operations added in each layer of the multilayer neural network can be reduced. Therefore, a multi-layer neural network having a large number of units can be mounted even in the arithmetic device 1 having a limited calculation resource such as an embedded device.
  • Embodiments 1 to 4 are intended to reduce the number of multiplications and additions caused by Wx matrix ⁇ vector operations by scalar quantization.
  • the recognition performance may be slightly reduced.
  • the scalar quantization control unit 102 determines whether or not to perform scalar quantization for each layer of the multilayer neural network according to the load of the arithmetic device 1, and controls the scalar quantization unit 103. To do.
  • the scalar quantization control unit 102 increases the number of layers in which the scalar quantization unit 103 performs scalar quantization when the load on the arithmetic device 1 is large. When the load on the arithmetic device 1 is small, the scalar quantization unit 103 causes the scalar quantization unit 103 to perform scalar quantization. Fewer layers to perform quantization.
  • the processor 2 shown in FIG. 2A has not only the voice recognition function described in the first to fourth embodiments but also the car navigation. Functions such as the system will also be executed. For example, in a configuration in which the arithmetic device 1 is incorporated in a car navigation system, the processor 2 executes applications such as route search and music playback. When a destination is set, the processor 2 is temporarily used for route search. In some cases, the computing resources of the processor 2 are occupied and the system load increases.
  • the scalar quantization control unit 102 increases the number of layers for performing the scalar quantization even if the recognition performance is sacrificed to some extent. Reduce the amount of computation of the neural network.
  • the scalar quantization control unit 102 performs scalar quantization to ensure maximum recognition performance. Instead, normal matrix multiplication is performed.
  • the configuration of the arithmetic device 1 according to the fifth embodiment is the same as the configuration of the arithmetic device 1 according to the first to fourth embodiments shown in FIGS. Is omitted.
  • the scalar quantization control unit 102 is configured to determine whether or not to perform scalar quantization for each layer of the multilayer neural network according to the load of the arithmetic device 1.
  • the load on the computing device 1 that is an embedded device is large, the amount of computation is reduced by scalar quantization, thereby enabling speech recognition even if there is a slight degradation in recognition performance.
  • the degradation of recognition performance can be minimized by minimizing the number of hierarchies.
  • the arithmetic device reduces the amount of calculation without degrading the recognition performance of the multilayer neural network, it is suitable for use in an embedded device in which calculation resources are limited.
  • 1 arithmetic unit, 2 processor, 3 memory, 4 processing circuit, 100 matrix multiplication unit, 101 weight storage unit, 102 scalar quantization control unit, 103 scalar quantization unit, 104 scalar quantization table storage unit, 105 likelihood calculation unit 106 Total weight value storage unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un dispositif informatique (1) qui comporte une unité de stockage de poids (101) pour stocker une matrice de poids pour chaque couche d'un réseau neuronal à couches multiples, une unité de multiplication de matrice (100) pour multiplier ensemble la matrice de poids stockée dans l'unité de stockage de poids (101) et un vecteur entré dans la couche d'entrée du réseau neuronal à couches multiples ou un vecteur délivré à partir d'une couche immédiatement précédente, une unité de stockage de table de quantification scalaire (104) pour stocker une table de quantification scalaire représentant une correspondance entre une plage de quantification dans laquelle la quantification scalaire est réalisée et la valeur quantifiée, une unité de quantification scalaire (103) pour se rapporter à la table de quantifications scalaire et quantifier la valeur de chaque dimension du vecteur délivré à partir de l'unité de multiplication de matrice (100), une unité de calcul de probabilité (105) pour calculer un vecteur de probabilité à l'aide du vecteur délivré à partir de l'unité de quantification scalaire (103) dans la couche de sortie du réseau neuronal à couches multiples, et une unité de commande de quantification scalaire (102) pour commander s'il faut ou non exécuter une quantification scalaire pour chaque couche du réseau neuronal à couches multiples.
PCT/JP2016/056583 2016-03-03 2016-03-03 Dispositif informatique et procédé de calcul WO2017149722A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/056583 WO2017149722A1 (fr) 2016-03-03 2016-03-03 Dispositif informatique et procédé de calcul

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/056583 WO2017149722A1 (fr) 2016-03-03 2016-03-03 Dispositif informatique et procédé de calcul

Publications (1)

Publication Number Publication Date
WO2017149722A1 true WO2017149722A1 (fr) 2017-09-08

Family

ID=59743717

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/056583 WO2017149722A1 (fr) 2016-03-03 2016-03-03 Dispositif informatique et procédé de calcul

Country Status (1)

Country Link
WO (1) WO2017149722A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3518153A1 (fr) 2018-01-29 2019-07-31 Panasonic Intellectual Property Corporation of America Système et procédé de traitement d'informations
EP3518152A1 (fr) 2018-01-29 2019-07-31 Panasonic Intellectual Property Corporation of America Système et procédé de traitement d'informations
CN110874626A (zh) * 2018-09-03 2020-03-10 华为技术有限公司 一种量化方法及装置
WO2020141587A1 (fr) 2019-01-02 2020-07-09 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Dispositif et procédé de traitement d'informations, ainsi que programme
JP6795721B1 (ja) * 2019-08-29 2020-12-02 楽天株式会社 学習システム、学習方法、及びプログラム
WO2021059791A1 (fr) * 2019-09-26 2021-04-01 日立Astemo株式会社 Dispositif de commande de moteur à combustion interne
CN113168574A (zh) * 2018-12-12 2021-07-23 日立安斯泰莫株式会社 信息处理装置、车载控制装置、车辆控制***

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0660051A (ja) * 1991-09-18 1994-03-04 Matsushita Electric Ind Co Ltd ニューラルネットワーク回路
JPH0854893A (ja) * 1994-08-09 1996-02-27 Matsushita Electric Ind Co Ltd 帰属度算出装置およびhmm装置
JPH08272759A (ja) * 1995-03-22 1996-10-18 Cselt Spa (Cent Stud E Lab Telecomun) 相関信号処理用ニューラルネットワークの実行スピードアップの方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0660051A (ja) * 1991-09-18 1994-03-04 Matsushita Electric Ind Co Ltd ニューラルネットワーク回路
JPH0854893A (ja) * 1994-08-09 1996-02-27 Matsushita Electric Ind Co Ltd 帰属度算出装置およびhmm装置
JPH08272759A (ja) * 1995-03-22 1996-10-18 Cselt Spa (Cent Stud E Lab Telecomun) 相関信号処理用ニューラルネットワークの実行スピードアップの方法

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3518153A1 (fr) 2018-01-29 2019-07-31 Panasonic Intellectual Property Corporation of America Système et procédé de traitement d'informations
EP3518152A1 (fr) 2018-01-29 2019-07-31 Panasonic Intellectual Property Corporation of America Système et procédé de traitement d'informations
US11036980B2 (en) 2018-01-29 2021-06-15 Panasonic Intellectual Property Corporation Of America Information processing method and information processing system
US11100321B2 (en) 2018-01-29 2021-08-24 Panasonic Intellectual Property Corporation Of America Information processing method and information processing system
CN110874626A (zh) * 2018-09-03 2020-03-10 华为技术有限公司 一种量化方法及装置
CN110874626B (zh) * 2018-09-03 2023-07-18 华为技术有限公司 一种量化方法及装置
CN113168574A (zh) * 2018-12-12 2021-07-23 日立安斯泰莫株式会社 信息处理装置、车载控制装置、车辆控制***
WO2020141587A1 (fr) 2019-01-02 2020-07-09 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Dispositif et procédé de traitement d'informations, ainsi que programme
JP6795721B1 (ja) * 2019-08-29 2020-12-02 楽天株式会社 学習システム、学習方法、及びプログラム
WO2021059791A1 (fr) * 2019-09-26 2021-04-01 日立Astemo株式会社 Dispositif de commande de moteur à combustion interne
US11655791B2 (en) 2019-09-26 2023-05-23 Hitachi Astemo, Ltd. Internal combustion engine control device

Similar Documents

Publication Publication Date Title
WO2017149722A1 (fr) Dispositif informatique et procédé de calcul
Polino et al. Model compression via distillation and quantization
US11868867B1 (en) Decompression and compression of neural network data using different compression schemes
Sung et al. Resiliency of deep neural networks under quantization
US11847569B2 (en) Training and application method of a multi-layer neural network model, apparatus and storage medium
Alvarez et al. On the efficient representation and execution of deep acoustic models
US20170200446A1 (en) Data augmentation method based on stochastic feature mapping for automatic speech recognition
KR20180043154A (ko) 뉴럴 네트워크 양자화(neural network quantization) 방법 및 장치
US20210287074A1 (en) Neural network weight encoding
TW202004658A (zh) 深度神經網絡自我調整增量模型壓縮的方法
JP2019139338A (ja) 情報処理装置、情報処理方法、およびプログラム
Lee et al. Accelerating recurrent neural network language model based online speech recognition system
CN111723901A (zh) 神经网络模型的训练方法及装置
CN112651485A (zh) 识别图像的方法和设备以及训练神经网络的方法和设备
Li et al. On the quantization of recurrent neural networks
WO2022044465A1 (fr) Procédé de traitement d'informations et système de traitement d'informations
Fuketa et al. Image-classifier deep convolutional neural network training by 9-bit dedicated hardware to realize validation accuracy and energy efficiency superior to the half precision floating point format
US20220067280A1 (en) Multi-token embedding and classifier for masked language models
Sethy et al. Unnormalized exponential and neural network language models
KR20230171312A (ko) 음성인식 모델을 경량화하는 방법 및 시스템
Chin et al. A high-performance adaptive quantization approach for edge CNN applications
US7945061B1 (en) Scalable architecture for subspace signal tracking
Takeda et al. Acoustic model training based on node-wise weight boundary model for fast and small-footprint deep neural networks
JP7438544B2 (ja) ニューラルネットワーク処理装置、コンピュータプログラム、ニューラルネットワーク製造方法、ニューラルネットワークデータの製造方法、ニューラルネットワーク利用装置、及びニューラルネットワーク小規模化方法
US20210216867A1 (en) Information processing apparatus, neural network computation program, and neural network computation method

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16892562

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16892562

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP