WO2018112892A1 - Dispositif et procédé de prise en charge d'une opération sur un réseau neuronal artificiel rapide - Google Patents

Dispositif et procédé de prise en charge d'une opération sur un réseau neuronal artificiel rapide Download PDF

Info

Publication number
WO2018112892A1
WO2018112892A1 PCT/CN2016/111737 CN2016111737W WO2018112892A1 WO 2018112892 A1 WO2018112892 A1 WO 2018112892A1 CN 2016111737 W CN2016111737 W CN 2016111737W WO 2018112892 A1 WO2018112892 A1 WO 2018112892A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
neurons
neuron
output
unit
Prior art date
Application number
PCT/CN2016/111737
Other languages
English (en)
Chinese (zh)
Inventor
刘少礼
郝一帆
陈云霁
郭崎
陈天石
Original Assignee
北京中科寒武纪科技有限公司
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京中科寒武纪科技有限公司, 上海寒武纪信息科技有限公司 filed Critical 北京中科寒武纪科技有限公司
Priority to PCT/CN2016/111737 priority Critical patent/WO2018112892A1/fr
Publication of WO2018112892A1 publication Critical patent/WO2018112892A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to the field of data processing technologies, and more particularly to an apparatus and method for fast artificial neural network operations.
  • Neural Networks are simply referred to as Neural Networks (NNs), which is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network relies on the complexity of the system, and adjusts the interconnection relationship between a large number of internal nodes to achieve the purpose of processing information.
  • the algorithm used by neural networks is vector multiplication, and symbolic functions and their various approximations are widely used.
  • Neural networks are widely used in a variety of application scenarios: computational vision, speech recognition, and natural language processing.
  • the scale of neural networks has been growing.
  • Lecun's neural network for handwritten character recognition was less than 1M in weight; in 2012, krizhevsky used to participate in the ImageNet competition with a scale of 60M weights.
  • the neural network is a high-calculation and high-access application.
  • the prior art generally uses a general-purpose processor to calculate the artificial neural network.
  • input neurons, output neurons, and weights are stored in three arrays, along with an indexed array that stores the connection between each output and the input connection.
  • the main operation is the multiplication of neurons with weights. Since the weight and the neuron are not one-to-one correspondence, each operation must find the weight corresponding to the neuron through the index array. Due to the weak computing power and memory access of general-purpose processors, the needs of neural networks cannot be met.
  • Another known method of supporting artificial neural network operations and their training algorithms is to use a graphics processing unit (GPU) that performs the processing by using a general purpose register file and a general purpose stream processing unit.
  • the above algorithm is supported by SIMD instructions.
  • the GPU is a device specially used for performing graphic image operations and scientific calculations, without special support for artificial neural network operations, a large amount of front-end decoding work is still required to perform artificial neural network operations, which brings a lot of additional overhead.
  • the GPU has only a small on-chip buffer.
  • the model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip. The off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption overhead.
  • the present invention proposes an apparatus and method for fast artificial neural network operation.
  • an apparatus for supporting fast artificial neural network operations comprising:
  • mapping unit receiving an input neuron, a weight, and a connection relationship between the input neuron and the output neuron, optimizing the connection relationship, outputting the mapped input neuron and the weight, and the mapped input neuron
  • the correspondence with the weight is the input neuron-weight pair.
  • a method for supporting fast artificial neural network operations comprising:
  • the mapping unit retrieves the input neurons, weights, and connection relationships in the storage unit and outputs the mapped input neurons and weights;
  • the computing device retrieves the mapped input neurons and weights and performs operations to obtain output neurons.
  • FIG. 1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an artificial neural network according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram showing a first connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;
  • FIG. 4 is a schematic diagram showing a second connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;
  • FIG. 5 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to an embodiment of the present invention
  • FIG. 6 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 5;
  • Figure 7 is a flow chart showing the operation steps of the arithmetic unit of Figure 6;
  • FIG. 8 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to another embodiment of the present invention.
  • FIG. 9 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 8;
  • FIG. 10 is a schematic structural diagram of a system supporting fast artificial neural network operation according to still another embodiment of the present invention.
  • Embodiments of the present invention provide an apparatus for supporting fast artificial neural network operations, including
  • the mapping unit optimizes the connection relationship and outputs the mapped input neurons and weights.
  • the correspondence between the mapped input neurons and the weights is the input neuron-weight pair, which reduces the computational complexity of the artificial neural network operation.
  • Fast artificial neural network operations including
  • the input neuron-weight pair is not a true data storage structure, but merely represents the correspondence between the input neurons and the weights.
  • the input neurons are stored in vector A
  • the weights are stored in vector B
  • the lengths of vectors A and B are the same
  • the components of the same position of vectors A and B are combined to be considered an input neuron-weight pair.
  • the input neurons and weights can be placed separately in different caches and used by the arithmetic unit.
  • the input data includes input neurons, weights, and connection relationships
  • the input data is input to the mapping unit 1
  • the mapped input neurons and weights are output by the mapping unit 1
  • the mapped input neurons and weights are mapped.
  • the corresponding relationship is the input neuron-weight pair.
  • the mapping unit includes a thinning mapping unit 11 and/or a fast mapping unit 12, and the thinning mapping unit 11 is configured to perform a thinning operation for removing a connection whose weight is 0 or whose weight is less than a preset threshold, and the fast mapping unit 12 Used to perform a fast mapping operation for removing connections where the input neuron is 0 or less than a preset threshold, and the two thresholds mentioned herein may not be equal.
  • the thinning mapping unit 11 includes a thinning determination unit and a thinning execution unit, and the thinning determination unit determines whether to perform a thinning operation. If the thinning determination unit determines to perform the thinning operation, the thinning execution unit is based on the input neuron and the output. Whether the weight is 0 or whether it is less than a predetermined threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if the thinning determination unit determines If the thinning operation is not performed, the default ownership value is non-zero or greater than the preset threshold. The connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.
  • connection relationship in the thinning mapping unit 11 can be expressed in the following two ways:
  • Use 1 to indicate that the weight between the input neuron and the output neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained, and 0 indicates that the input neuron weight is zero or less than the preset.
  • Threshold the connection between the input neuron and the output neuron is removed, and each output neuron is connected with all input neurons to form a string of 0 and 1 to represent the connection relationship of the output neuron, and the output is output
  • the connection relationship of neurons is combined into a vector.
  • Retaining/removing the connection according to whether the weight is zero or whether it is less than a preset threshold, and outputting the distance of the first connection where the first connection is located from the first input neuron, and the outputting the second input neuron distance The distance of the last input neuron, the distance of the output third input neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship of the output .
  • the fast mapping unit 12 includes a fast mapping determining unit and a fast mapping executing unit, and the fast mapping determining unit determines whether the neural network performs an operation of discriminating the input neurons, and if the fast mapping determining unit determines execution, according to whether the value of the input neuron is 0 or Whether it is less than a preset threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if not, all input neurons are not 0 or both by default. Greater than the preset threshold, the connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.
  • connection relationship in the fast mapping unit 12 can also be expressed in the following two ways:
  • Use 1 to indicate that the input neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained.
  • 0 indicates that the input neuron is zero or less than a preset threshold, and the input neuron and output nerve are removed.
  • the connection between the elements, each output neuron and all of its input neurons form a string of 0 and 1 to represent the connection relationship of the output neurons, and combine the connection relationships of all output neurons into a vector.
  • Retaining/removing the connection according to whether the input neuron is zero or less than a preset threshold, and the distance at which the first connection of the output neuron is located is the distance from the first input neuron, and the output neuron is second.
  • the distance of the input neuron from the previous input neuron, the distance of the third input neuron of the output neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted Indicates the connection relationship of the output.
  • K the Kth layer
  • K+1 the Kth layer
  • the +1 layer is called the output layer. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer, and the number of neurons in each layer is predicted.
  • the input layer be composed of N input neurons I 1 , I 2 , . . . , I N
  • the output layer consists of M output neurons O 1 , O 2 , . . . , O M .
  • the first connection method :
  • each output neuron O j gets its corresponding connection relationship. Since the input layer has N nodes, the connection relationship has N bits, and the value of each bit is 1 or 0. The value of the i-th bit indicates that there is a connection between I i and O j , and 0 indicates I i and O j . There is no connection between them. Initially, the value of these N bits is set to 1. If the value of the input neuron I i is zero or less than a preset threshold, or if the weight between I i and O j is zero or less than a preset threshold, then the value of the i-th bit in the connection relationship is set If it is 0, it is considered that there is no connection between I i and O j . Then, all the connection relations of the output neurons are combined into one vector, and the N ⁇ (j-1)+1 component to the N ⁇ jth component value of the vector is the connection relationship corresponding to the output neuron O j .
  • the number of input layer neurons is equal to the number of stored bits of the connection relationship corresponding to each output neuron. So even with the simplest one-dimensional array that takes only 0,1 values, you can clearly know the connection relationship of each output neuron.
  • connection relationship For each output neuron O j get its corresponding connection relationship. If the value of the input neuron I i is zero or less than a preset threshold, or if the weight between I i and O j is zero or less than a preset threshold, then there is no connection between I i and O j , Otherwise there is a connection. If the input and O j of the neural element is connected to I i_1, I i_2, ..., I i_n, wherein 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N. Then, the connection relationship has n bits; the first bit value is equal to i_1-1; n ⁇ k>1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1).
  • connection relationship can be represented by a high-dimensional dynamic array, which can be represented by a linked list or the like.
  • the mapping unit After the processed connection relationship is obtained, the mapping unit outputs the mapped input neurons and weights according to the processed connection relationship, and the mapping relationship between the mapped input neurons and the weights is an input neuron-weight pair, after mapping
  • the input neurons and weights can be used directly during the operation.
  • the thinning mapping unit 11 and the fast mapping unit 12 optimize the connection relationship of the input data, and output the mapped input neurons and weights, and the corresponding connection relationship can adopt two representation modes: One is to use between each input and output neuron One indicates whether there is a connection, and the other is the distance between the connections to indicate the location of each connection.
  • the artificial neural network has 4 input neurons: I1, I2, I3, I4; there are 2 output nerves. Element: O1, O2; the weights of the connections are expressed as: W11, W21, W31, W41, W12, W22, W32, W42. Let I1 have a value of 0, I2, I3, and I4 are not 0; let W21, W12, and W42 be 0, and the remaining weights are non-zero.
  • the sparse mapping unit and the fast mapping unit can process the data at the same time, or the data can be processed in turn and the order of the two can be interchanged. The following only describes the data processing by the sparse mapping unit.
  • the first connection is expressed as follows:
  • the connection relationship of O2 defaults to: 1111, and the placement order is 11111111; if the thinning operation is performed, as shown in FIG. 3, the connection of the output neuron O1 is output.
  • the relationship is: 1011, each bit indicates whether there is a connection with the input, 1 indicates that there is a connection, 0 indicates no connection, and the connection relationship of the output neuron O2 is 0110.
  • the input neurons and the weights corresponding to the connection relationship of 0 are not operated.
  • the connection relationship can be stored in the order of the output neurons. Put all the inputs of each output neuron in turn and combine them into a vector.
  • the order of the above example is 10110110.
  • the connection relationship and the placement order of O1, O2 are unchanged; if the operation of discriminating the input neuron value is performed, the map after the thinning operation is performed
  • the connection relationship of the output neuron O1 is: 0011, and the first digit is changed from 1 to 0 because the first input neuron I1 has a value of 0, the connection from I1 is removed, and the output nerve is output.
  • the connection relationship of the element O2 is: 0110, and finally placed as: 00110110.
  • the connection relationship of the output neuron O1 is: 0111, and the connection relationship of the output neuron O2 is: 0111, and finally placed as: 01110111.
  • the second connection is expressed as follows:
  • the connection relationship of O1, O2 defaults to: 0, 1, 1, 1; if the thinning operation is performed, as shown in FIG. 4, the output neurons are output.
  • O1 is connected to the input neurons I1, I3, I4, then the connection relationship is 0, 2, 1.
  • 0 indicates that the position of the first connection is 0 from the first input neuron, that is, the first input nerve Yuan
  • 2 indicates that the distance of the second input neuron from the previous input neuron is 2
  • 1 indicates that the distance of the third input neuron from the previous input neuron is 1, that is, Represents the fourth input neuron.
  • the connection relationship of O2 is 1,1.
  • the connection relationship of O1, O2 is unchanged; if the operation of discriminating the input neuron value is performed, the operation shown in FIG. 4 after performing the thinning operation
  • the neural network because the first input neuron I1 value is 0, removes the connection from I1, so the connection relationship of the output neuron O1 is: 2, 1, the connection relationship of the output neuron O2 is: 1,1.
  • the connection relationship of O1 and O2 is: 1,1,1.
  • the thinning mapping unit 11 and the fast mapping unit 12 in the present invention include, but are not limited to, the above connection relationship.
  • the sparse mapping unit 11 and the fast mapping unit 12 output the mapped neurons and weights according to the connection relationship obtained above, and the correspondence between the mapped neurons and the weights is an input neuron-weight pair, and the input nerve
  • the meta-weight pair can be used directly in the operation, taking the specific process of outputting the neuron O1 mapping in an artificial neural network as shown in FIG. 2 as an example:
  • the input neurons are: I1, I2, I3, I4, and the input weights are: W11, W21, W31, W41, where I1, W21 take 0, and the rest are non-zero.
  • connection relationship is: 1011, or 0, 2, 1.
  • the connection relationship is: 0011, or 2, 1.
  • the two mapping units output the input neurons with the value of 0 removed and the connection weights issued therefrom.
  • the mapped input neurons are I3, I4, and the mapped weights are W31, W41.
  • the input neuron-weight pair is: I3-W31, I4-W41.
  • the obtained input neuron vector is (I3, I4), and the obtained weight vector is (W31, W41).
  • the fast mapping unit is used to perform the fast mapping, and finally the mapped input neurons and weights are obtained.
  • the two mapping units preferably operate on the data at the same time, regardless of the order.
  • a device supporting fast artificial neural network operation in the embodiment of the present invention except for a mapping unit 1, further comprising: a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, an input neuron buffer 6, a weight buffer 7, an arithmetic unit 8, and an output neuron cache 9.
  • a mapping unit 1 further comprising: a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, an input neuron buffer 6, a weight buffer 7, an arithmetic unit 8, and an output neuron cache 9.
  • the storage unit 2 is configured to store data and instructions, and receive and store externally input data and instructions, including input neurons, weights, and connection relationships.
  • the mapping unit 1 retrieves the input neurons, the weights, and the connection relationships in the storage unit 2, and performs the thinning operation by the thinning mapping unit 11, performs the fast mapping by the fast mapping unit 12, and the mapping unit 1 obtains the mapping of the data.
  • the mapped input neurons and weights are stored in the storage unit 2.
  • the DMA 3 calls the instruction in the storage unit 2 and the mapped input neurons and weights, and allocates them to the instruction cache 4, the input neuron buffer 6, and the weight buffer 7, respectively.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step.
  • the vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for supporting fast artificial neural network operation. As shown in FIG. 6, the method includes the following steps:
  • S101 Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.
  • the mapping unit calls all input neurons, weights, and connection relationships in the storage unit, and processes the same, and obtains the mapped input neurons and weights, and stores them in the storage unit.
  • the sparse mapping unit sparses input neurons, weights, and connection relationships.
  • the processing and fast mapping unit performs fast mapping processing on input neurons, weights, and connection relationships.
  • Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights.
  • the neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.
  • the weight buffer 7 reads the partially mapped neurons and weights through the DMA3.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • S1041 Perform a multiplication operation, which is used to multiply the mapped neuron and the weight data;
  • S1042 Perform an addition tree operation, and add the results obtained in the first stage by the addition tree step by step to complete the vector inner product operation;
  • S1043 Perform non-linear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transformation is an activation function operation, and the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S106 It is determined whether all the mapped neurons and the weights are calculated. If the result is N, the process returns to step S103. If the result is Y, step S107 is performed.
  • Another embodiment of the present invention provides an apparatus for supporting fast artificial neural network operations, including a mapping unit 1, a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, and an input neuron cache. 6.
  • the storage unit 2 is configured to store data and instructions, and receive and store externally input data and instructions, including input neurons, weights, and connection relationships.
  • the instructions in the DMA3 call storage unit 2 are allocated to the instruction cache 4, and the input neurons in the call storage unit 2, the weights, and the connection relationship are assigned to the mapping unit 1 for direct mapping.
  • the mapping unit 1 performs a thinning operation by the thinning mapping unit 11, and the fast mapping unit 12 performs fast mapping, and the mapping unit 1 obtains the mapped input neurons and weights through mapping of the data, and transmits them to the neuron cache 6, and the weight buffer 7, respectively.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to perform a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step.
  • the vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for supporting fast artificial neural network operations, as shown in FIG. 9, including the following steps:
  • S201 Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.
  • mapping unit calls part of the input neurons, weights, and connection relationships in the storage unit through DMA3, and processes the same, and obtains the mapped input neurons and weights directly into the neuron cache 6, and the weight cache. 7.
  • the sparse mapping unit performs thinning processing on the input neurons, the weights, and the connection relationships
  • the fast mapping unit performs fast mapping processing on the input neurons, the weights, and the connection relationships.
  • Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights.
  • the neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S205 It is determined whether all input neurons and weights are mapped and operated. If the result is N, the process returns to step S102. If the result is Y, step S107 is performed.
  • the thinning mapping unit and the fast mapping unit of the mapping unit in the embodiment are mapped in the calculation, and the mapped data is directly calculated to the operation unit, and the previous embodiment
  • the data mapped by the thinning mapping unit and the fast mapping unit of the mapping unit before use in the calculation of the arithmetic unit are stored in the storage unit.
  • the operation speed is faster.
  • FIG. 10 Another embodiment of the present invention provides a system for fast artificial neural network operation, as shown in FIG. 10, which includes: an I/O interface 20, a storage device 30, a central processing unit (CPU) 40, and a fast artificial neural network operation.
  • I/O interface 20 an I/O interface 20
  • storage device 30 a storage device 30
  • CPU 40 central processing unit 40
  • fast artificial neural network operation a fast artificial neural network operation.
  • the I/O interface 20 for I/O data needs to be sent by the CPU 40 to the device 10 supporting the fast artificial neural network operation, and then written to the storage device 30 by the device 10 supporting the fast artificial neural network operation, and the fast artificial neural network operation
  • the dedicated instructions required by device 10 are also transmitted by CPU 40 to fast artificial neural network computing device 10.
  • Storage device 30 is used to temporarily store artificial neural network models and neuron data, particularly when all models are not available in the cache on device 10 supporting fast artificial neural network operations.
  • a central processing unit (CPU) 40 is used for data handling and basic control such as start-stop of the device 10 supporting fast artificial neural network operation, as an interface between the device 10 supporting the fast artificial neural network operation and external control.
  • a device 10 supporting fast artificial neural network operations for accepting data and programs from the CPU 40, executing a fast artificial neural network operation algorithm, and executing the results of the device 10 supporting the fast artificial neural network operation is transmitted back to the CPU 40.
  • the device 10 supporting the fast artificial neural network operation is supported as a coprocessor of the CPU 40 or the GPU to execute a fast artificial neural network operation algorithm.
  • a plurality of devices supporting fast artificial neural network operations are interconnected to form a system: a plurality of devices supporting fast artificial neural network operations can be interconnected through a PCIE bus to support larger-scale rapid artificial neural network operations. Can share the same host CPU or Each has its own host CPU, which can share memory or each accelerator has its own memory.
  • the interconnection method can be any interconnection topology.
  • a device or method using the techniques of the present invention when performing a neural network operation, if a portion of the input neurons and weights in the network have a value equal to zero or near zero for a given neural network, then In terms of computational speed, there is an improvement over devices or methods that do not employ the techniques described herein. Moreover, the larger the ratio of input neurons equal to 0 or near 0 to all input neurons in the network, the greater the increase in operation speed; the value of the value equal to 0 or the weight near 0 is the proportion of the ownership value in the network. Large, the higher the speed of the operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne le dispositif et le procédé de prise en charge d'une opération de réseau neuronal artificiel rapide. Le dispositif comprend : une unité de mappage pour recevoir un neurone d'entrée, une valeur pondérale et une relation de connexion entre le neurone d'entrée et un neurone de sortie, l'optimisation de la relation de connexion, et la sortie du neurone d'entrée mappé et de la valeur pondérale mappée, la corrélation entre le neurone d'entrée mappé et la valeur pondérale mappée étant une paire de valeurs de poids de neurone d'entrée. Au moyen du dispositif et du procédé, la relation de connexion entre le neurone d'entrée et la valeur de poids est optimisée au moyen d'une unité de mappage clairsemée et/ou d'une unité de mappage rapide, réduisant ainsi la quantité de calcul, résolvant le problème de performance opérationnelle insuffisante d'une CPU et d'une GPU et de grands coûts de décodage d'extrémité avant, et améliorant efficacement le support pour des algorithmes de fonctionnement de réseau neuronal artificiel multi-niveau.
PCT/CN2016/111737 2016-12-23 2016-12-23 Dispositif et procédé de prise en charge d'une opération sur un réseau neuronal artificiel rapide WO2018112892A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/111737 WO2018112892A1 (fr) 2016-12-23 2016-12-23 Dispositif et procédé de prise en charge d'une opération sur un réseau neuronal artificiel rapide

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/111737 WO2018112892A1 (fr) 2016-12-23 2016-12-23 Dispositif et procédé de prise en charge d'une opération sur un réseau neuronal artificiel rapide

Publications (1)

Publication Number Publication Date
WO2018112892A1 true WO2018112892A1 (fr) 2018-06-28

Family

ID=62624217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/111737 WO2018112892A1 (fr) 2016-12-23 2016-12-23 Dispositif et procédé de prise en charge d'une opération sur un réseau neuronal artificiel rapide

Country Status (1)

Country Link
WO (1) WO2018112892A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740739A (zh) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 神经网络计算装置、神经网络计算方法及相关产品
CN111222632A (zh) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 计算装置、计算方法及相关产品
CN111523653A (zh) * 2019-02-03 2020-08-11 上海寒武纪信息科技有限公司 运算装置及方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (zh) * 2003-10-21 2004-09-15 上海交通大学 神经网络建模方法
CN105701540A (zh) * 2016-01-11 2016-06-22 清华大学 一种自生成神经网络构建方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (zh) * 2003-10-21 2004-09-15 上海交通大学 神经网络建模方法
CN105701540A (zh) * 2016-01-11 2016-06-22 清华大学 一种自生成神经网络构建方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNFEI ET AL.: "Dynamic Optimization Structure Design for Neural Networks: Review and Perspective", CONTROL THEORY & APPLICATIONS, vol. 27, no. 3, 31 March 2010 (2010-03-31), ISSN: 1000-8152 *
SUN, HUANLONG ET AL.: "A New Pruning Algorithm for Feedforward Neural Network", JOURNAL OF GUANGXI TEACHERS, vol. 30, no. 4, 31 December 2013 (2013-12-31), ISSN: 1002-8743 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222632A (zh) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 计算装置、计算方法及相关产品
CN109740739A (zh) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 神经网络计算装置、神经网络计算方法及相关产品
CN109740739B (zh) * 2018-12-29 2020-04-24 中科寒武纪科技股份有限公司 神经网络计算装置、神经网络计算方法及相关产品
CN111523653A (zh) * 2019-02-03 2020-08-11 上海寒武纪信息科技有限公司 运算装置及方法
CN111523653B (zh) * 2019-02-03 2024-03-29 上海寒武纪信息科技有限公司 运算装置及方法

Similar Documents

Publication Publication Date Title
CN107545303B (zh) 用于稀疏人工神经网络的计算装置和运算方法
US11568258B2 (en) Operation method
CN108427990B (zh) 神经网络计算***和方法
WO2021208612A1 (fr) Procédé et dispositif de traitement de données
WO2017124642A1 (fr) Dispositif et procédé permettant d'exécuter un calcul depuis l'origine d'un réseau de neurones artificiels
WO2018113790A1 (fr) Appareil et procédé de fonctionnement pour un réseau neuronal artificiel
WO2019127838A1 (fr) Procédé et appareil de réalisation d'un réseau neuronal convolutionnel, terminal et support de stockage
CN111105023B (zh) 数据流重构方法及可重构数据流处理器
CN108171328B (zh) 一种神经网络处理器和采用其执行的卷积运算方法
CN106846235A (zh) 一种利用NVIDIA Kepler GPU汇编指令加速的卷积优化方法及***
WO2018112892A1 (fr) Dispositif et procédé de prise en charge d'une opération sur un réseau neuronal artificiel rapide
CN111860773A (zh) 处理装置和用于信息处理的方法
WO2017181336A1 (fr) Appareil et procédé d'opération de couche "maxout"
CN112348182A (zh) 一种神经网络maxout层计算装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16924610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16924610

Country of ref document: EP

Kind code of ref document: A1