CN109685203B - Data processing method, device, computer system and storage medium - Google Patents

Data processing method, device, computer system and storage medium Download PDF

Info

Publication number
CN109685203B
CN109685203B CN201811568921.2A CN201811568921A CN109685203B CN 109685203 B CN109685203 B CN 109685203B CN 201811568921 A CN201811568921 A CN 201811568921A CN 109685203 B CN109685203 B CN 109685203B
Authority
CN
China
Prior art keywords
neural network
node
equivalent
recurrent neural
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811568921.2A
Other languages
Chinese (zh)
Other versions
CN109685203A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN201811568921.2A priority Critical patent/CN109685203B/en
Publication of CN109685203A publication Critical patent/CN109685203A/en
Application granted granted Critical
Publication of CN109685203B publication Critical patent/CN109685203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The application relates to a data processing method, a data processing device, a computer system and a storage medium. The data processing method, the data processing device, the computer system and the storage medium can greatly shorten the time for generating the off-line model of the recurrent neural network node, and further improve the processing speed and efficiency of the processor.

Description

Data processing method, device, computer system and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, a computer system, and a storage medium.
Background
With the development of artificial intelligence technology, deep learning is ubiquitous and indispensable nowadays, and accordingly, many extensible deep learning systems such as TensorFlow, MXNet, Caffe, PyTorch and the like are generated, which can be used for providing various neural network models capable of running on a processor such as a CPU or a GPU. Generally, neural networks may include recurrent neural networks, as well as non-recurrent neural networks, and the like.
However, the time for generating the recurrent neural network is generally proportional to the number of cycles and the index of the number of layers, and in the one-layer recurrent neural network, if the number of cycles is in the order of 10^2, the time required for directly generating the offline model exceeds 12 hours, and the time for generating the offline model is too long, so that the processing efficiency is low.
Disclosure of Invention
In view of the above, it is desirable to provide a data processing method, an apparatus, a computer system, and a storage medium capable of improving processing efficiency.
A method of data processing, the method comprising:
acquiring a recurrent neural network node, wherein the recurrent neural network node comprises at least one recurrent neural network unit;
circularly calling a first offline model corresponding to the recurrent neural network unit, and operating the recurrent neural network node according to the first offline model;
wherein the first offline model includes weight data and instruction data for the individual recurrent neural network elements.
In one embodiment, an original network including the recurrent neural network nodes is obtained;
if the current node in the original network is an acyclic neural network node, acquiring weight data and instruction data of the current node from a second offline model corresponding to the original network, and directly operating the current node according to the weight data and the instruction data of the current node;
and the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
In one embodiment, according to the original network, determining an equivalent network corresponding to the original network, wherein the equivalent network comprises at least one equivalent recurrent neural network node and at least one equivalent acyclic neural network node;
determining an execution sequence of each equivalent node in the equivalent network according to the dependency relationship of each equivalent node in the equivalent network corresponding to the original network;
if the current equivalent node is an equivalent non-cyclic neural network node, acquiring weight data and instruction data of the current equivalent node from a second offline model corresponding to the original network, and directly operating the current equivalent node according to the weight data and the instruction data of the current equivalent node.
In one embodiment, if there is no dependency relationship between equivalent nodes in the equivalent network, executing the equivalent nodes in parallel;
and if the dependency relationship exists among all equivalent nodes in the equivalent network, executing the equivalent nodes according to the dependency relationship.
In one embodiment, at least one of the recurrent neural network nodes in the original network and a connected slice of an acyclic neural network are obtained, wherein the connected slice of the acyclic neural network comprises at least one acyclic neural network node;
updating the connection relation of each acyclic neural network node in the acyclic neural network connection piece according to the dependency relation of each node in the original network to obtain an updated acyclic neural network connection piece;
respectively equating the updated communicating pieces of the non-cyclic neural network to be equivalent non-cyclic neural network nodes;
and determining the dependency relationship of each equivalent acyclic neural network node and the equivalent cyclic neural network node according to the dependency relationship of each node in the original network to obtain the equivalent network corresponding to the original network.
In one embodiment, the input node and the output node of each recurrent neural network node are determined according to the dependency relationship of each node in the original network;
and disconnecting the input node and the output node of the recurrent neural network node from the recurrent neural network node to obtain at least one recurrent neural network node and a communication sheet of the non-recurrent neural network, wherein the communication sheet of the non-recurrent neural network comprises at least one non-recurrent neural network node.
In one embodiment, according to the dependency relationship of each node in the original network, whether each non-cyclic neural network node in a communication sheet of the non-cyclic neural network depends on the output result of the cyclic neural network node is respectively judged;
if the non-cyclic neural network nodes in the communication sheet of the non-cyclic neural network depend on the output result of the cyclic neural network nodes, disconnecting the input nodes of the non-cyclic neural network nodes from the non-cyclic neural network nodes to obtain the updated communication sheet of the non-cyclic neural network.
In one embodiment, if the first offline model is stateful, the first offline model further includes state input data; the step of circularly calling a first off-line model corresponding to the recurrent neural network node and operating the recurrent neural network node according to the first off-line model further comprises the following steps:
acquiring weight data, instruction data and state input data of the current cyclic neural network unit from the first offline model;
operating the cyclic neural network unit according to the weight data, the instruction data and the state input data of the single cyclic neural network unit;
and storing the current output result of the recurrent neural network unit as state input data to the first offline model, and then returning to the step of acquiring weight data, instruction data and input data of the current recurrent neural network unit from the first offline model until the operation of the recurrent neural network node is completed.
A data processing apparatus, the apparatus comprising:
the second acquisition module is used for acquiring a recurrent neural network node, and the recurrent neural network node comprises at least one recurrent neural network unit;
the second execution module is used for circularly calling the first off-line model corresponding to the recurrent neural network node and operating the recurrent neural network node according to the first off-line model;
wherein the first offline model includes weight data and instruction data for the individual recurrent neural network elements.
In one embodiment, the second obtaining module is further configured to obtain an original network including the recurrent neural network node;
the second execution module is further configured to, if a current node in the original network is an acyclic neural network node, obtain weight data and instruction data of the current node from a second offline model corresponding to the original network, and directly operate the current node according to the weight data and the instruction data of the current node;
and the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
In one embodiment, the apparatus further comprises an equivalence module and a determination module;
the equivalent module is used for determining an equivalent network corresponding to the original network according to the original network, wherein the equivalent network comprises at least one equivalent cyclic neural network node and at least one equivalent acyclic neural network node;
the determining module is configured to determine an execution sequence of each equivalent node in the equivalent network according to a dependency relationship of each equivalent node in the equivalent network corresponding to the original network;
the second execution module is further configured to, if the current equivalent node is an equivalent acyclic neural network node, obtain weight data and instruction data of the current equivalent node from a second offline model corresponding to the original network, and directly operate the current equivalent node according to the weight data and the instruction data of the current equivalent node.
In one embodiment, the execution module is further configured to: if the equivalent nodes in the equivalent network have no dependency relationship, executing the equivalent nodes in parallel;
and if the dependency relationship exists among all equivalent nodes in the equivalent network, executing the equivalent nodes according to the dependency relationship.
In one embodiment, the equivalent module includes a second obtaining unit, an updating unit and an equivalent unit; wherein,
the second obtaining unit is configured to obtain at least one recurrent neural network node in the original network and a connected slice of an acyclic neural network, where the connected slice of the acyclic neural network includes at least one acyclic neural network node;
the updating unit is used for updating the connection relation of each acyclic neural network node in the acyclic neural network connection piece according to the dependency relation of each node in the original network to obtain an updated acyclic neural network connection piece;
the equivalent unit is configured to respectively equivalent the updated connected pieces of the acyclic neural network into an equivalent acyclic neural network node, and further configured to determine, according to the dependency relationship of each node in the original network, the dependency relationship between each equivalent acyclic neural network node and the equivalent cyclic neural network node, so as to obtain an equivalent network corresponding to the original network.
A computer system, comprising: comprising a processor and a memory, in which a computer program is stored, which, when being executed, carries out the method of any of the above.
In one embodiment, the processor package operation unit and the controller unit; the arithmetic unit includes: a master processing circuit and a plurality of slave processing circuits;
the controller unit is used for acquiring input data and instructions;
the controller unit is further configured to analyze the instruction to obtain a plurality of instruction data, and send the plurality of instruction data and the input data to the main processing circuit;
the main processing circuit is used for performing preamble processing on the input data and transmitting data and instruction data with the plurality of slave processing circuits;
the plurality of slave processing circuits are used for executing intermediate operation in parallel according to the data and the instruction data transmitted from the master processing circuit to obtain a plurality of intermediate results and transmitting the plurality of intermediate results to the master processing circuit;
and the main processing circuit is used for executing subsequent processing on the plurality of intermediate results to obtain the result of the instruction.
A computer storage medium having stored thereon a computer program which, when executed by one or more first processors, performs the method of any one of the above.
According to the data processing method, the data processing device, the computer system and the storage medium, the cyclic neural network unit is operated according to the model data set and the model structure parameters of the single cyclic neural network unit, the instruction data of the single cyclic neural network unit is obtained, and then the first offline model corresponding to the single cyclic neural network unit is obtained, wherein the first offline model comprises the weight data and the instruction data of the single cyclic neural network unit. According to the data processing method, only the first offline model of the single recurrent neural network unit needs to be obtained, and compiling and operation on all recurrent neural network units in the recurrent neural network node are not needed, so that the offline model generation time of the recurrent neural network node can be greatly shortened, and the processing speed and efficiency of the processor are improved.
Drawings
FIG. 1 is a system block diagram of a computer system of an embodiment;
FIG. 2 is a system block diagram of a computer system of another embodiment;
FIG. 3 is a system block diagram of a processor of an embodiment;
FIG. 4 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 5 is a schematic diagram of a recurrent neural network in one embodiment;
FIG. 6 is a flowchart illustrating step S310;
FIG. 7 is a flowchart illustrating the step S100;
FIG. 8 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 9 is a network architecture diagram of a neural network of an embodiment;
FIG. 10 is a block diagram showing the structure of a data processing apparatus according to an embodiment;
FIG. 11 is a schematic flow chart diagram illustrating a data processing method according to another embodiment;
FIG. 12 is a flow diagram that illustrates a data processing method according to one embodiment;
FIG. 13 is a schematic flow chart diagram illustrating operation of an equivalent network according to one embodiment;
FIG. 14 is a schematic flow chart illustrating an exemplary method for obtaining an equivalent network according to another embodiment;
fig. 15 is a flowchart illustrating step S7012;
fig. 16 is a flowchart illustrating step S900;
FIG. 17 is a block diagram showing the structure of a data processing apparatus according to an embodiment.
Detailed Description
In order to make the technical solution of the present invention clearer, the neural network processing method, the computer system, and the storage medium of the present invention are described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 is a block diagram of a computer system 1000 according to an embodiment, where the computer system 1000 may include a processor 110 and a memory 120 coupled to the processor 110. Referring to fig. 2, the processor 110 is used for providing computing and control capabilities, and may include an obtaining module 111, an operation module 113, a control module 112, and the like, wherein the obtaining module 111 may be a hardware module such as an IO (Input/Output) interface, and the operation module 113 and the control module 112 are both hardware modules. For example, the operation module 113 and the control module 112 may be digital circuits, analog circuits, or the like. The physical implementation of the hardware circuit includes but is not limited to physical devices including but not limited to transistors, memristors, and the like.
Alternatively, the processor 110 may be a general-purpose processor, such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a DSP (Digital signal Processing), and the processor 110 may also be a dedicated neural network processor such as an IPU (intelligent Processing Unit). Of course, the processor 110 may also be an instruction set processor, an associated chipset, a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or onboard memory for caching purposes, or the like.
Optionally, referring to fig. 3, the processor 110 is configured to perform machine learning calculation, and the processor 110 includes: a controller unit 20 and an arithmetic unit 12, wherein the controller unit 20 is connected with the arithmetic unit 12, and the arithmetic unit 12 comprises: a master processing circuit and a plurality of slave processing circuits;
a controller unit 20 for acquiring input data and computing instructions; in an alternative, the input data and the calculation instruction may be obtained through a data input/output unit, and the data input/output unit may be one or more data I/O interfaces or I/O pins.
The above calculation instructions include, but are not limited to: the present invention is not limited to the specific expression of the above-mentioned computation instruction, such as a convolution operation instruction, or a forward training instruction, or other neural network operation instruction.
The controller unit 20 is further configured to analyze the calculation instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the input data to the main processing circuit;
a master processing circuit 101 configured to perform a preamble process on the input data and transmit data and an operation instruction with the plurality of slave processing circuits;
a plurality of slave processing circuits 102 configured to perform an intermediate operation in parallel according to the data and the operation instruction transmitted from the master processing circuit to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master processing circuit;
and the main processing circuit 101 is configured to perform subsequent processing on the plurality of intermediate results to obtain a calculation result of the calculation instruction.
The technical scheme that this application provided sets the arithmetic element to a main many slave structures, to the computational instruction of forward operation, it can be with the computational instruction according to the forward operation with data split, can carry out parallel operation to the great part of calculated amount through a plurality of processing circuits from like this to improve the arithmetic speed, save the operating time, and then reduce the consumption.
Optionally, the machine learning calculation specifically includes: the artificial neural network operation, where the input data specifically includes: neuron data and weight data are input. The calculation result may specifically be: the result of the artificial neural network operation outputs neuron data.
In the forward operation, after the execution of the artificial neural network of the previous layer is completed, the operation instruction of the next layer takes the output neuron calculated in the operation unit as the input neuron of the next layer to perform operation (or performs some operation on the output neuron and then takes the output neuron as the input neuron of the next layer), and at the same time, the weight value is replaced by the weight value of the next layer; in the reverse operation, after the reverse operation of the artificial neural network of the previous layer is completed, the operation instruction of the next layer takes the input neuron gradient calculated in the operation unit as the output neuron gradient of the next layer to perform operation (or performs some operation on the input neuron gradient and then takes the input neuron gradient as the output neuron gradient of the next layer), and at the same time, the weight value is replaced by the weight value of the next layer.
The above-described machine learning calculations may also include support vector machine operations, k-nearest neighbor (k-nn) operations, k-means (k-means) operations, principal component analysis operations, and the like. For convenience of description, the following takes artificial neural network operation as an example to illustrate a specific scheme of machine learning calculation.
For the artificial neural network operation, if the artificial neural network operation has multilayer operation, the input neurons and the output neurons of the multilayer operation do not refer to the neurons in the input layer and the neurons in the output layer of the whole neural network, but for any two adjacent layers in the network, the neurons in the lower layer of the network forward operation are the input neurons, and the neurons in the upper layer of the network forward operation are the output neurons. Taking a convolutional neural network as an example, let a convolutional neural network have layers, and for the first layer and the second layer, we will refer to the first layer as an input layer, in which the neurons are the input neurons, and the second layer as an output layer, in which the neurons are the output neurons. That is, each layer except the topmost layer can be used as an input layer, and the next layer is a corresponding output layer.
Optionally, the computing device may further include: the storage unit 10 and the direct memory access unit 50, the storage unit 10 may include: one or any combination of a register and a cache, specifically, the cache is used for storing the calculation instruction; the register is used for storing the input data and a scalar; the cache is a scratch pad cache. The direct memory access unit 50 is used to read or store data from the storage unit 10.
Optionally, the controller unit includes: an instruction storage unit 210, an instruction processing unit 211, and a store queue unit 212;
an instruction storage unit 210, configured to store a calculation instruction associated with the artificial neural network operation;
the instruction processing unit 211 is configured to analyze the computation instruction to obtain a plurality of operation instructions;
a store queue unit 212 for storing an instruction queue comprising: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue.
For example, in an alternative embodiment, the main operation processing circuit may also include a controller unit, and the controller unit may include a main instruction processing unit, specifically configured to decode instructions into microinstructions. Of course, in another alternative, the slave arithmetic processing circuit may also include another controller unit that includes a slave instruction processing unit, specifically for receiving and processing microinstructions. The micro instruction may be a next-stage instruction of the instruction, and the micro instruction may be obtained by splitting or decoding the instruction, and may be further decoded into control signals of each component, each unit, or each processing circuit.
The memory 120 may also store a computer program for implementing the data processing method provided in the embodiment of the present application. Specifically, the data processing method is used for generating a first offline model corresponding to a recurrent neural network node in an original network received by the processor 110, where the first offline model may include weight data and instruction data of the single recurrent neural network unit, where the instruction data may be used to indicate what kind of calculation function the node is used to perform, so that when the processor 110 operates the recurrent neural network node again, the first offline model corresponding to the recurrent neural network unit may be called in a cyclic manner, and there is no need to repeat operations such as compiling on each network unit in the recurrent neural network node, which greatly shortens the time for generating the offline model of the recurrent neural network node, thereby shortening the operation time when the processor 110 operates the network, and further improves the processing speed and efficiency of the processor 110.
Optionally, with continuing reference to fig. 2, the memory 120 may include a first storage unit 121, a second storage unit 122, and a third storage unit 123, where the first storage unit 121 may be used to store a computer program, and the computer program is used to implement the data processing method provided in the embodiment of the present application. The second storage unit 122 can be used for storing relevant data during the operation of the neural network, and the third storage unit 123 is used for storing an offline model. Optionally, the number of the storage units included in the memory may also be greater than three, and is not specifically limited herein. The memory 120 may be an internal memory, such as a volatile memory such as a cache, which may be used for storing relevant data during the operation of the neural network, such as input data, output data, weight values, instructions, and so on. The memory 120 may also be a nonvolatile memory such as an external memory, and may be used to store an offline model corresponding to the neural network. Therefore, when the computer system 1000 needs to compile the same neural network again to run the network, the offline model corresponding to the network can be directly obtained from the memory, thereby improving the processing speed and efficiency of the processor.
Alternatively, the number of the memories 120 may be three or more. One of the memories 120 is used to store a computer program for implementing the data processing method provided in the embodiments of the present application. One of the memories 120 is used to store data related to the operation of the neural network, and optionally, the memory used to store data related to the operation of the neural network may be a volatile memory. Another memory 120 may be used to store the offline model corresponding to the neural network, and optionally, the memory used to store the offline model may be a non-volatile memory.
It should be clear that running the original network in this embodiment means that the processor runs some kind of machine learning algorithm (e.g. neural network algorithm) using the artificial neural network model data, and implements the target application of the algorithm (e.g. artificial intelligence application such as speech recognition) by performing forward operation. In this embodiment, directly operating the offline model corresponding to the original network means that the offline model is used to operate a machine learning algorithm (e.g., a neural network algorithm) corresponding to the original network, and a forward operation is performed to implement a target application of the algorithm (e.g., an artificial intelligence application such as speech recognition). The original network may include a recurrent neural network or an acyclic neural network.
In an embodiment, as shown in fig. 4, the present application provides a data processing method for generating and storing a first offline model according to a recurrent neural network unit, so that compiling and calculating are not required to be performed on all recurrent neural network units in the recurrent neural network node, the time for generating the offline model of the recurrent neural network node is shortened, and the processing speed and efficiency of a processor are improved. Specifically, the method comprises the following steps:
and S100, acquiring a recurrent neural network node.
The Recurrent Neural Network node (RNN) is formed by connecting single Recurrent Neural Network units with each gate in a cyclic manner, and typical RNNs include a gated cyclic Network (GRU) and a long-short term memory Network (LSTM). A layer of computational units within an RNN is often referred to as an RNN unit (RNN cell). As shown in fig. 5, the recurrent neural network node includes at least one recurrent neural network unit, and specifically, the RNN unit may include an input layer, an implicit layer, and an output layer, wherein the number of the implicit layers may be more than one.
Specifically, the processor acquires a recurrent neural network node for acquiring a recurrent neural network unit in a subsequent step. Further, the processor may obtain the model data set and the model structure parameters of the recurrent neural network node, thereby determining the recurrent neural network node according to the model data set and the model structure parameters of the recurrent neural network node. The model data set corresponding to the recurrent neural network node includes weight data corresponding to each layer in the recurrent neural network node, and W1 to W3 in the recurrent neural network unit shown in fig. 5 are used to represent the weight data corresponding to a single recurrent neural network node. The model structure parameters corresponding to the nodes of the recurrent neural network comprise the dependency relationship among layers in a single recurrent neural network unit or the dependency relationship among the recurrent neural network units.
Alternatively, the recurrent neural network nodes may be independent recurrent neural network nodes, and the recurrent neural network nodes may also be placed in an original network, which may include at least one recurrent neural network node and non-recurrent neural network node.
Alternatively, as shown in fig. 5, the system may determine individual recurrent neural network elements from the recurrent neural network nodes. Specifically, after the processor acquires the recurrent neural network node, the processor may determine a single recurrent neural network unit according to the structure of the recurrent neural network node.
And S200, operating the single recurrent neural network unit according to the model data set and the model structure parameters of the single recurrent neural network unit in the recurrent neural network node to obtain instruction data corresponding to the single recurrent neural network unit.
Specifically, the processor acquires a model data set and model structure parameters of a single recurrent neural network unit, then operates the single recurrent neural network unit, and then acquires instruction data corresponding to the single recurrent neural network unit. It should be clear that the operation of the recurrent neural network unit in the embodiment of the present application means that the processor uses the artificial neural network model data to operate some machine learning algorithm (e.g., neural network algorithm), and implements the target application of the algorithm (e.g., artificial intelligence application such as speech recognition) by performing forward operation.
S300, obtaining a first off-line model corresponding to the single recurrent neural network unit according to the instruction data corresponding to the single recurrent neural network unit.
Wherein the first offline model includes weight data and instruction data for individual recurrent neural network elements.
Specifically, the processor can obtain the first offline model corresponding to the single recurrent neural network unit according to the instruction data and the weight data corresponding to the single recurrent neural network unit, so that compiling and operation of all recurrent neural network units in the recurrent neural network node are not required, the offline model generation time of the recurrent neural network node can be greatly shortened, and the processing speed and the processing efficiency of the processor are improved.
Furthermore, when the cyclic neural network node needs to be operated repeatedly, the first offline model can be called circularly to realize the operation of the cyclic neural network node, so that the operations of compiling each node in the neural network and the like are reduced, and the operation efficiency is improved.
According to the data processing method, the cyclic neural network unit is operated according to the model data set and the model structure parameters of the single cyclic neural network unit to obtain the instruction data of the single cyclic neural network unit, and further obtain a first offline model corresponding to the single cyclic neural network unit, wherein the first offline model comprises the weight data and the instruction data of the single cyclic neural network unit. According to the data processing method, only the first offline model of the single recurrent neural network unit needs to be obtained, and compiling and operation on all recurrent neural network units in the recurrent neural network node are not needed, so that the offline model generation time of the recurrent neural network node can be greatly shortened, and the processing speed and efficiency of the processor are improved.
In one embodiment, the step S300 may include:
s310, correspondingly storing the weight data and the instruction data of the single recurrent neural network unit to obtain a first offline model corresponding to the single recurrent neural network unit.
Specifically, the processor may store weight data and instruction data of the individual recurrent neural network elements in the memory to enable generation and storage of the first offline model. And aiming at a single cyclic neural network unit, storing the weight data and the instruction data of the single cyclic neural network unit in a one-to-one correspondence manner. In this way, when the recurrent neural network node is run again, the first offline model may be retrieved directly from the memory and the recurrent neural network node may be run by cyclically calling the first offline model.
Optionally, the processor may store the weight data and the instruction data corresponding to the single recurrent neural network unit in a non-volatile memory, so as to generate and store the first offline model. When the recurrent neural network unit is operated again, the off-line model corresponding to the recurrent neural network unit can be directly obtained from the nonvolatile memory, and the recurrent neural network unit is operated according to the off-line model corresponding to the off-line model.
In the embodiment, compiling and operation of all the recurrent neural network units in the recurrent neural network node are not required, so that the time for generating the offline model by the recurrent neural network node is shortened, and the running speed and the running efficiency of the system are improved.
Alternatively, as shown in fig. 6, the step S310 may include the following steps:
s311, determining a memory allocation mode corresponding to the recurrent neural network unit according to the model data set and the model structure parameters of the recurrent neural network unit.
Specifically, the processor may obtain an execution order of each layer in the recurrent neural network unit according to the model structure parameter of the recurrent neural network unit, and determine the memory allocation manner of the current recurrent neural network unit according to the execution order of each layer in the recurrent neural network unit. For example, the relevant data of each layer in the recurrent neural network unit is saved in a stack in the execution order. The memory allocation method is to determine the storage location of data (including input data, output data, weight data, intermediate result data, and the like) related to each layer in the recurrent neural network unit in a memory space (such as a memory). For example, the data table may be used to store the mapping relationship between the data (input data, output data, weight data, intermediate result data, and the like) related to each layer and the memory space.
S312, storing the relevant data in the operation process of the recurrent neural network unit into one of the storages or one storage unit of the storages according to the memory allocation manner corresponding to the recurrent neural network unit.
The relevant data in the operation process of the recurrent neural network unit comprises weight data, instruction data, input data, intermediate calculation results, output data and the like corresponding to each layer of the recurrent neural network unit. For example, as shown in fig. 5, X represents input data of the recurrent neural network unit, and Y represents output data of the recurrent neural network unit, and the processor may convert the output data of the recurrent neural network unit into a control command for controlling the robot or a different digital interface. W1 to W3 are used to represent weight data. The processor may store the relevant data in the operation process of the recurrent neural network unit into one of the memories or one storage unit of the memory, such as a volatile memory, such as an internal memory or a cache, according to the determined memory allocation manner.
S313, obtaining the weight data and the instruction data of the recurrent neural network unit from the memory or the storage unit, obtaining a first offline model, and storing the first offline model in a nonvolatile memory or a nonvolatile storage unit of the memory.
Specifically, weight data and instruction data of the recurrent neural network unit are obtained from the memory or the storage unit, a first offline model is obtained, the first offline model is stored in a nonvolatile memory or a nonvolatile storage unit of the memory, and the first offline model stored in the corresponding storage space is the corresponding first offline model of the recurrent neural network unit.
In one embodiment, as shown in fig. 7, step S100 in the data processing method may include:
and S110, acquiring an original network containing the recurrent neural network nodes.
The original network may include a recurrent neural network node and an acyclic neural network node.
Specifically, the processor may obtain the model data set and the model structure parameters of the original network, and further obtain the network structure diagram of the original network through the model data set and the model structure parameters of the original network. The model data set includes data such as weight data corresponding to each node in the original network, and W1 to W6 in the neural network shown in fig. 10 are weight data for representing the nodes. The model structure parameters include the dependency relationship of a plurality of nodes in the original network and the calculation attribute of each node, wherein the dependency relationship between the nodes is used for indicating whether data is transferred between the nodes, for example, when a data stream is transferred between the nodes, the dependency relationship between the nodes can be indicated. Further, the dependency relationship of each node may include an input relationship and an output relationship, and the like.
And S120, determining the dependency relationship of each node in the original network according to the model structure parameters of the original network.
Specifically, the processor acquires the model structure parameters of the original network, which may include the dependency relationship of each node in the original network, so that after acquiring the model structure parameters of the original network, the processor can determine the dependency relationship of each node in the original network according to the model structure parameters of the original network.
And S130, determining the input node and the output node of each recurrent neural network node according to the dependency relationship of each node in the original network.
The input node is a node for inputting data to the recurrent neural network node, and the output node is a node for inputting data to the recurrent neural network node.
Specifically, the processor may sequence the nodes according to the dependency relationship of each node in the original network to obtain a linear sequence between the nodes, and further determine the input node and the output node of each recurrent neural network node.
For example, the processor may determine the input node and the output node of each recurrent neural network node according to the original network in fig. 9(a), and may obtain that the output nodes of the non-recurrent neural network node (non-RNN) 1 are the non-recurrent neural network node (non-RNN) 2 and the recurrent neural network node (RNN), the input nodes of the non-recurrent neural network node (non-RNN) 2 are the non-recurrent neural network node (non-RNN) 1, the output nodes of the non-recurrent neural network node (non-RNN) 2 are the non-recurrent neural network node (non-RNN) 3 and the non-recurrent neural network node (non-RNN) 4, the input node of the recurrent neural network node (RNN) is determined to be the non-recurrent neural network node (non-RNN) 1, and the output node is the non-recurrent neural network node (non-RNN) 3.
And S140, disconnecting the connection relation between the input node and the output node of the recurrent neural network node and the recurrent neural network node to obtain at least one recurrent neural network node.
Specifically, after the processor determines the input node and the output node of each recurrent neural network node, the connection relationship between the input node of each recurrent neural network node and each recurrent neural network node is disconnected, and meanwhile, the connection relationship between the output node of each recurrent neural network node and each recurrent neural network node is also disconnected, so that at least one independent recurrent neural network node is obtained.
For example, as shown in fig. 9, in the graph b, after the recurrent neural network nodes and the input nodes and the output nodes thereof are determined, the connection relationship between the recurrent neural network nodes and the input nodes thereof in the graph a is disconnected, the connection relationship between the recurrent neural network nodes and the output nodes thereof is disconnected, and each recurrent neural network node is separated to obtain at least one independent recurrent neural network node, thereby obtaining the graph b.
In one embodiment, as shown in fig. 8, the method may further include the steps of:
s150, determining the execution sequence of each node in the original network according to the dependency relationship of each node in the original network.
Wherein the original network may include recurrent neural network nodes and non-recurrent neural network nodes.
Specifically, the processor may obtain the network structure diagram of the original network by obtaining the model data set and the dependency relationship of the original network and further by obtaining the model data set and the dependency relationship of the original network. The model data set comprises data such as weight data corresponding to each node in the original network. The dependency relationship between the nodes is used to indicate whether data is transferred between the nodes, for example, when there is a data stream transfer between multiple nodes, it may be stated that there is a dependency relationship between the multiple nodes. Further, the dependency relationship of each node may include an input relationship and an output relationship, and the like. And determining the execution sequence of each node in the original network according to the obtained dependency relationship. Optionally, the processor may determine an execution mode between each node according to a dependency relationship of each node in the original network; if the nodes have no dependency relationship, executing the nodes in parallel; if the nodes have dependency relationship, the nodes are executed in sequence.
For example, the dependency relationship of each node in the neural network shown in fig. 9 is determined, and then the linear sequence of each node is determined to be the non-recurrent neural network node (non-RNN) 1-non-recurrent neural network node (non-RNN) 2-recurrent neural network node (RNN) -non-recurrent neural network node (non-RNN) 3-non-recurrent neural network node (non-RNN) 4, and the execution sequence between each node may also be the non-recurrent neural network node (non-RNN) 1-recurrent neural network node (RNN) -non-recurrent neural network node (non-RNN) 3-non-recurrent neural network node (non-RNN) 4-non-recurrent neural network node (non-RNN) 2.
And S160, operating the original network according to the execution sequence of each node to obtain the instruction data of each non-cyclic neural network node in the original network.
Specifically, after the processor determines the execution sequence of each node, the original network is operated according to the execution sequence of each node, and then instruction data of each acyclic neural network node in the original network is obtained respectively.
S170, correspondingly storing the weight data and the instruction data corresponding to each acyclic neural network node to obtain a second offline model.
And the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
Specifically, after the processor runs the original network, the weight data and the instruction data corresponding to each acyclic neural network node may be stored correspondingly, so as to obtain the second offline model. Optionally, the processor may store the weight data and the instruction data corresponding to each acyclic neural network node in a nonvolatile memory, so as to generate and store the second offline model. And aiming at the nodes of the non-cyclic neural network, storing the weight data and the instruction data of the nodes in a one-to-one correspondence manner. Therefore, when the non-cyclic neural network node is operated again, the corresponding second off-line model can be directly obtained from the non-volatile memory, the non-cyclic neural network node is operated according to the second off-line model corresponding to the non-cyclic neural network node, the updated non-cyclic neural network node does not need to be compiled online to obtain an instruction, and the operation speed and the efficiency of the system are improved.
Optionally, the step S170 may include the following steps:
and S171, determining a memory allocation mode corresponding to each acyclic neural network node in the original network according to the model data set and the model structure parameters of the original network.
Specifically, the processor may obtain an execution sequence of each node in the original network according to the model structure parameter of the original network, determine a memory allocation manner of the current network according to the execution sequence of each computing node in the original network, and further obtain a memory allocation manner corresponding to each acyclic neural network node in the original network. For example, the related data of the nodes of the acyclic neural network in the running process is saved in a stack according to the execution sequence of each node. The memory allocation method is to determine a storage location of relevant data (including input data, output data, weight data, intermediate result data, and the like) of the nodes of the acyclic neural network in a memory space (such as a memory). For example, the data table may be used to store the mapping relationship between the relevant data (input data, output data, weight data, intermediate result data, and the like) of the acyclic neural network node and the memory space.
And S172, storing relevant data in the operation process of the non-cyclic neural network node into one of the storages or one storage unit of the storages according to the memory allocation mode corresponding to the non-cyclic neural network node.
The relevant data in the operation process of the nodes of the non-cyclic neural network comprise weight data, instruction data, input data, intermediate calculation results, output data and the like corresponding to the nodes of the non-cyclic neural network. The processor may store the relevant data of the acyclic neural network node during operation into one of the memories or one of the storage units of the memory, such as a volatile memory, such as an internal memory or a cache, according to the determined memory allocation manner.
S173, obtaining the weight data and instruction data of the acyclic neural network node from the memory or the storage unit, obtaining a second offline model, and storing the second offline model in a nonvolatile memory or a nonvolatile storage unit of the memory.
Specifically, the weight data and the instruction data of the non-cyclic neural network node are obtained from the memory or the storage unit, the second offline model is obtained, and the second offline model is stored in a nonvolatile memory or a nonvolatile storage unit of the memory, and what is stored in the corresponding storage space is the corresponding second offline model of the cyclic neural network unit.
In one embodiment, the method may further include the steps of:
s300, judging whether the first off-line model is in a state or not.
Wherein, the input of the first recurrent neural network unit is generally regarded as 0 by default. The first offline model is stateless, meaning that the output is a function of the input, i.e., output (f) (input). The first offline model is a stateful representation whose output is a function of the input and history, i.e., output, history ═ g (input, history).
Specifically, it is determined whether the first offline model is stateful, and when the first offline model is determined to be stateful, step S400 is executed, where the first offline model further includes state input data, and the state input data may be output data of a last recurrent neural network unit before the hidden layer.
In this embodiment, the state of the first offline model is determined, so that the first offline model is generated more accurately.
In one embodiment, the method may further include the steps of:
s600, acquiring a new original network containing new recurrent neural network nodes.
Wherein the new original network may include new recurrent neural network nodes and non-recurrent neural network nodes.
Specifically, the processor acquires a new original network, acquires a model data set and model structure parameters of the new original network, and can acquire a network structure diagram of the new original network through the model data set and the model structure parameters of the new original network.
S700, if the corresponding offline model exists in the new original network, acquiring the offline model corresponding to the new original network, and operating the new original network according to the offline model corresponding to the new original network.
The new offline model corresponding to the original network comprises a first offline model and a second offline model.
Specifically, when the acquired new original network has a corresponding offline model, the offline model corresponding to the new original network is acquired, and the new original network is operated according to the acquired offline model corresponding to the new original network.
Optionally, when a new original network is operated, if a current node in the new original network is a recurrent neural network node, the first offline model is called in a cyclic manner to realize operation of the recurrent neural network node.
Optionally, when a new original network is operated, if a current node in the new original network is an acyclic neural network node, obtaining weight data and instruction data of the current node from a second offline model corresponding to the new original network, and directly operating the second offline model according to the weight data and the instruction data of the current node.
In this embodiment, when the neural network is operated, the offline model corresponding to the neural network may be directly obtained, and the neural network is operated according to the offline model corresponding to the neural network, and it is not necessary to compile each node of the neural network on line to obtain an instruction, thereby improving the operation speed and efficiency of the system.
In one embodiment, as shown in fig. 10, there is provided a data processing apparatus including: a first obtaining module 100 and a generating module 200, wherein:
a first obtaining module 100, configured to obtain a recurrent neural network node, where the recurrent neural network node includes at least one recurrent neural network unit.
And an operation module 200, configured to operate a single recurrent neural network unit in the recurrent neural network node according to the model data set and the model structure parameters of the single recurrent neural network unit, so as to obtain instruction data corresponding to the single recurrent neural network unit.
A generating module 300, configured to obtain a first offline model corresponding to the single recurrent neural network unit according to the instruction data corresponding to the single recurrent neural network unit.
In one embodiment, the generating module 300 is further configured to correspondingly store the weight data and the instruction data of the single recurrent neural network unit, and obtain the first offline model corresponding to the single recurrent neural network unit.
In one embodiment, the data processing apparatus further includes a determining module and a first executing module. The judging module is used for judging whether the first off-line model is in a state or not. The first execution module is configured to, if the first offline model is stateful, further include state input data, where the state input data is output data of a last recurrent neural network unit before the hidden layer.
In one embodiment, the first obtaining module 100 includes: a first obtaining unit, configured to obtain an original network including the recurrent neural network node; the first determining unit is used for determining the dependency relationship of each node in the original network according to the model structure parameters of the original network; the first determining unit is further configured to determine an input node and an output node of each of the recurrent neural network nodes in the original network according to the dependency relationship of each of the nodes in the original network; the first execution unit is used for disconnecting the input node and the output node of the recurrent neural network node from the recurrent neural network node to obtain at least one recurrent neural network node.
In one embodiment, the first obtaining module 100 further includes a second determining unit and a second executing unit: the second determining unit is configured to determine an execution sequence of each node in the original network according to a dependency relationship of each node in the original network; the second execution unit is configured to operate the original network according to the execution sequence of each node, and obtain instruction data of each acyclic neural network node in the original network; the generating module is used for correspondingly storing the weight data and the instruction data corresponding to each acyclic neural network node to obtain a second offline model; and the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
For specific limitations of the data processing apparatus, reference may be made to the above limitations of the data processing method, which are not described herein again. The various modules in the data processing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, the present application further provides a computer system comprising a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to perform the method according to any one of the above embodiments. Specifically, when the processor executes the computer program, the following steps are specifically executed:
and acquiring the nodes of the recurrent neural network. Specifically, the processor acquires a recurrent neural network node for acquiring a recurrent neural network unit in a subsequent step. Further, the processor may obtain the model data set and the model structure parameters of the recurrent neural network node, thereby determining the recurrent neural network node according to the model data set and the model structure parameters of the recurrent neural network node.
And operating the single recurrent neural network unit according to the model data set and the model structure parameters of the single recurrent neural network unit in the recurrent neural network node to obtain the instruction data corresponding to the single recurrent neural network unit. Specifically, the processor acquires a model data set and model structure parameters of a single recurrent neural network unit, then operates the single recurrent neural network unit, and then acquires instruction data corresponding to the single recurrent neural network unit. It should be clear that the operation of the recurrent neural network unit in the embodiment of the present application means that the processor uses the artificial neural network model data to operate some machine learning algorithm (e.g., neural network algorithm), and implements the target application of the algorithm (e.g., artificial intelligence application such as speech recognition) by performing forward operation.
And obtaining a first off-line model corresponding to the single recurrent neural network unit according to the instruction data corresponding to the single recurrent neural network unit. Specifically, the processor can obtain the first offline model corresponding to the single recurrent neural network unit according to the instruction data and the weight data corresponding to the single recurrent neural network unit, so that compiling and operation of all recurrent neural network units in the recurrent neural network node are not required, the offline model generation time of the recurrent neural network node can be greatly shortened, and the processing speed and the processing efficiency of the processor are improved.
In an embodiment, there is also provided a computer storage medium having a computer program stored therein, which when executed by one or more processors performs the method of any of the above embodiments. The computer storage media may include, among other things, non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In an embodiment, as shown in fig. 11, the present application further provides a data processing method, which may include the following steps:
and S800, acquiring a recurrent neural network node.
Specifically, the processor obtains a recurrent neural network node and operates the recurrent neural network node.
And S900, circularly calling a first offline model corresponding to the recurrent neural network node, and operating the recurrent neural network node according to the first offline model.
Wherein the first offline model includes weight data and instruction data for individual recurrent neural network elements.
Specifically, after the processor acquires the recurrent neural network node, the processor circularly calls a first offline model corresponding to the recurrent neural network node, and operates the recurrent neural network node according to the first offline model.
In this embodiment, the recurrent neural network node is operated by cyclically calling the first offline model, so that the processing efficiency and speed of the computer system are improved.
Optionally, the processor may determine the total number of recurrent neural network units included in the recurrent neural network node, and cyclically invoke the first offline model by using the total number of RNN units in the recurrent neural network node as the number of times of invoking the first offline model. Specifically, the step S900 may include:
and each time the first offline model is called to complete the operation of one RNN unit, the calling times are decreased once to obtain the current execution times until the current execution times are equal to the initial value. Here, the initial value may be 0.
Or, each time the first offline model is called to complete the operation of one RNN unit, the calling times are increased once from the initial value until the execution times are equal to the total number of the RNN units in the recurrent neural network node.
In one embodiment, as shown in fig. 12, the method further comprises the steps of:
s1000, acquiring an original network containing the recurrent neural network nodes.
Wherein the original network may include recurrent neural network nodes and non-recurrent neural network nodes.
Specifically, the processor acquires an original network, acquires a model data set and model structure parameters of the original network, and can acquire a network structure diagram of the original network through the model data set and the model structure parameters of the original network.
S1200, if the current node in the original network is the node of the non-cyclic neural network, acquiring the weight data and the instruction data of the current node from a second offline model corresponding to the original network, and directly operating the current node according to the weight data and the instruction data of the current node.
And the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
Specifically, when the original network is operated, if the current node in the original network is an acyclic neural network node, the weight data and the instruction data of the current node are obtained from the second offline model corresponding to the original network, and the second offline model is directly operated according to the weight data and the instruction data of the current node. The processor judges whether the current node in the original network is an acyclic neural network node, if so, the processor acquires weight data and instruction data of the current node from the second offline model, and directly operates the current node according to the weight data and the instruction data of the current node.
In one embodiment, as shown in fig. 13, the method may include the steps of:
s701, determining an equivalent network corresponding to the original network according to the original network.
Wherein the equivalent network comprises at least one equivalent recurrent neural network node and at least one equivalent acyclic neural network node.
Specifically, the processor processes the original network according to the obtained original network, and may obtain an equivalent network corresponding to the original network.
S702, determining the execution sequence of each equivalent node in the equivalent network according to the dependency relationship of each equivalent node in the equivalent network corresponding to the original network.
Specifically, the processor may sequence each equivalent node according to the dependency relationship of each equivalent node in the original network to obtain a linear sequence between each equivalent node, and further determine the execution sequence of each equivalent node.
For example, as shown in fig. 9, the dependency relationship of each node in the neural network shown in fig. a is determined, and then the linear sequence of each node is determined to be the non-recurrent neural network node (non-RNN) 1-non-recurrent neural network node (non-RNN) 2-recurrent neural network node (RNN) -non-recurrent neural network node (non-RNN) 3-non-recurrent neural network node (non-RNN) 4, and the execution sequence between each node may also be the non-recurrent neural network node (non-RNN) 1-recurrent neural network node (RNN) -non-recurrent neural network node (non-RNN) 3-non-recurrent neural network node (non-RNN) 4-non-recurrent neural network node (non-RNN) 2.
And S703, if the current equivalent node is the equivalent acyclic neural network node, acquiring weight data and instruction data of the current equivalent node from a second offline model corresponding to the original network, and directly operating the current equivalent node according to the weight data and the instruction data of the current equivalent node.
Specifically, when the equivalent network is operated, if the current equivalent node is an equivalent acyclic neural network node, the weight data and the instruction data of the current equivalent node are obtained from the second offline model corresponding to the original network, and the current equivalent node is directly operated according to the weight data and the instruction data of the current equivalent node.
If the current equivalent node is an equivalent cyclic neural network node, acquiring weight data and instruction data of the current equivalent node from a first offline model corresponding to the original network, and circularly calling the first offline model according to the weight data and the instruction data of the current equivalent node to operate the current equivalent node.
In this embodiment, by determining the execution sequence among the equivalent nodes according to the equivalent network, the execution of the same type of node can be completed by collectively operating the offline models of the same type of node, and then the offline models of other types are switched and called, so that the number of times of switching calls can be reduced, and the operation efficiency of the processor can be improved.
In one embodiment, as shown in fig. 14, the step of obtaining the equivalent network corresponding to the original network may include the following steps:
s7011, at least one cyclic neural network node in the original network and a communication sheet of the non-cyclic neural network are obtained.
The communication piece of the non-cyclic neural network is connected by at least one non-cyclic neural network node, so that the communication piece of the non-cyclic neural network comprises at least one non-cyclic neural network node. As shown in fig. 9, in the graph b, after separating each recurrent neural network node, at least one recurrent neural network node is obtained, and an acyclic neural network node that is not disconnected is also obtained, and the acyclic neural network nodes connected together in the graph b are referred to as a connection slice of the acyclic neural network.
Specifically, the original network is processed according to the original network and the original network structure to obtain at least one cyclic neural network node and a communication sheet of the acyclic neural network.
S7012, updating the connection relation of each acyclic neural network node in the acyclic neural network connection piece according to the dependency relation of each node in the original network, and obtaining the updated acyclic neural network connection piece.
Specifically, according to the obtained connected piece of the acyclic neural network and the dependency relationship of each node in the original network, the processor processes the connected piece of the acyclic neural network, updates the connection relationship of each acyclic neural network node in the connected piece of the acyclic neural network, and obtains the updated connected piece of the acyclic neural network. As shown in figure 9, panel c.
And S7013, respectively equating the updated communicating piece of the non-cyclic neural network to be an equivalent non-cyclic neural network node.
Specifically, after the updated connected piece of the non-cyclic neural network is obtained, the updated connected piece of the non-cyclic neural network is equivalent to an equivalent non-cyclic neural network node.
As shown in fig. 9, in the diagram d, after the updated connected piece of the acyclic neural network is obtained, the updated connected piece of the acyclic neural network is equivalent to an equivalent acyclic neural network node.
S7014, determining the dependency relationship of each equivalent acyclic neural network node and each equivalent cyclic neural network node according to the dependency relationship of each node in the original network to obtain the equivalent network corresponding to the original network.
Specifically, the processor determines the dependency relationship of each equivalent acyclic neural network node and the equivalent cyclic neural network node according to the dependency relationship of each node in the original network, and connects the input relationship and the output relationship of the equivalent cyclic neural network node and the equivalent acyclic neural network node to obtain the equivalent network corresponding to the original network. As shown in diagram d of fig. 9.
In one embodiment, the step S7011 may further include the following steps:
s70111, determining input nodes and output nodes of the recurrent neural network according to the dependency relationship of the nodes in the original network.
The input node is a node for inputting data to the recurrent neural network node, and the output node is a node for inputting data to the recurrent neural network node.
Specifically, the processor may sequence the nodes according to the dependency relationship of each node in the original network to obtain a linear sequence between the nodes, and further determine the input node and the output node of each recurrent neural network node.
For example, the processor may determine the input node and the output node of each recurrent neural network node according to the original network in fig. 9(a), and may obtain that the output nodes of the non-recurrent neural network node (non-RNN) 1 are the non-recurrent neural network node (non-RNN) 2 and the recurrent neural network node (RNN), the input nodes of the non-recurrent neural network node (non-RNN) 2 are the non-recurrent neural network node (non-RNN) 1, the output nodes of the non-recurrent neural network node (non-RNN) 2 are the non-recurrent neural network node (non-RNN) 3 and the non-recurrent neural network node (non-RNN) 4, the input node of the recurrent neural network node (RNN) is determined to be the non-recurrent neural network node (non-RNN) 1, and the output node is the non-recurrent neural network node (non-RNN) 3.
S70112, disconnecting the input node and the output node of the recurrent neural network node from the recurrent neural network node to obtain at least one recurrent neural network node and a connection piece of the non-recurrent neural network.
The communication piece of the non-cyclic neural network is connected by at least one non-cyclic neural network node, so that the communication piece of the non-cyclic neural network comprises at least one non-cyclic neural network node. As shown in fig. 9, in the graph b, after separating each recurrent neural network node, at least one recurrent neural network node is obtained, and an acyclic neural network node that is not disconnected is also obtained, and the acyclic neural network nodes connected together in the graph b are referred to as a connection slice of the acyclic neural network.
Specifically, after the processor determines the input node and the output node of each cyclic neural network node, the connection relationship between the input node and the cyclic neural network node of the cyclic neural network node is disconnected, and the connection relationship between the output node and the cyclic neural network node of the cyclic neural network node is also disconnected, so that at least one independent cyclic neural network node and a communication sheet of the non-cyclic neural network are obtained.
For example, as shown in fig. 9, in the graph b, after the recurrent neural network nodes and the input nodes and the output nodes thereof are determined, the connection relationship between the recurrent neural network nodes and the input nodes in the graph a is disconnected, the connection relationship between the recurrent neural network nodes and the output nodes is disconnected, and each recurrent neural network node is separated to obtain at least one independent recurrent neural network node, thereby obtaining the graph b.
In one embodiment, as shown in fig. 15, the step S7012 may further include the following steps:
s70121, respectively determining whether each non-recurrent neural network node in the connection piece of the non-recurrent neural network depends on the output result of the recurrent neural network node according to the dependency relationship of each node in the original network.
Specifically, the processor may determine the input-output relationship among the nodes according to the dependency relationship of each node in the original network, so as to respectively determine whether each acyclic neural network node in the connection piece of the acyclic neural network depends on the output result of the cyclic neural network node, that is, in the connection piece of the acyclic neural network, determine whether a node-dependent cyclic neural network node exists as an input node.
Optionally, as shown in fig. 9, in a diagram a, according to a data flow direction between each node, obtaining a dependency relationship or a connection relationship between each node, determining an input-output relationship of each node in a connection piece, and if an execution sequence is an acyclic neural network node (non-RNN) 1-acyclic neural network node (non-RNN) 2-cyclic neural network node (RNN) 1-acyclic neural network node (non-RNN) 3-acyclic neural network node (non-RNN) 4, it can be seen that the acyclic neural network node (non-RNN) 3 and the acyclic neural network node (non-RNN) 4 depend on an output result of the cyclic neural network node.
S70122, if the acyclic neural network node in the connection piece of the acyclic neural network depends on the output result of the cyclic neural network node, disconnecting the connection relationship between the input node of the acyclic neural network node and the acyclic neural network node to obtain the updated connection piece of the acyclic neural network.
Specifically, if it is determined that the acyclic neural network node in the communicating piece of the acyclic neural network depends on the output result of the cyclic neural network node, the processor disconnects the connection relationship between the acyclic neural network node and the input node thereof, and obtains the updated communicating piece of the acyclic neural network. Namely, after judging that the node takes the cyclic neural network node as the input node, the processor disconnects the input relation between the node and the existing input node in the communication sheet of the non-cyclic neural network.
For example, as shown in fig. 9, in the graph c, when there is an acyclic neural network node having a cyclic neural network node as an input node in the connected segment of the acyclic neural network, the input relationship between the node and the input node in the connected segment of the acyclic neural network is disconnected, and the updated connected segment of the acyclic neural network is obtained. That is, it can be seen that the non-recurrent neural network nodes (non-RNN) 3 and the non-recurrent neural network nodes (non-RNN) 4 depend on the output result of the recurrent neural network nodes, and the input nodes of the non-recurrent neural network nodes (non-RNN) 3 are the non-recurrent neural network nodes (non-RNN) 2, and the input nodes of the non-recurrent neural network nodes (non-RNN) 4 are the non-recurrent neural network nodes (non-RNN) 2, the connection relationship between the non-recurrent neural network nodes (non-RNN) 3 and the non-recurrent neural network nodes (non-RNN) 2 is broken, and the relationship between the non-recurrent neural network nodes (non-RNN) 4 and the non-recurrent neural network nodes (non-RNN) 2 is broken, so as to obtain the updated connection piece of the non-recurrent neural network, as shown in fig. c.
In one embodiment, as shown in fig. 16, the step S900 may further include the following steps:
when the first offline model is stateful and the first offline model further includes state input data, step S902 is executed to obtain weight data, instruction data, and state input data of the current recurrent neural network unit from the first offline model.
And S904, operating the recurrent neural network unit according to the weight data, the instruction data and the state input data of the single recurrent neural network unit.
S906, storing the output result of the current recurrent neural network unit as state input data to the first offline model, and then returning to the step of acquiring the weight data, the instruction data and the input data of the current recurrent neural network unit from the first offline model until the operation of the recurrent neural network node is completed.
And when the first offline model is judged to be stateless, adding state input data when the first offline model is called every time, and storing the output result for the next use.
Optionally, the state input data may further include input data of the recurrent neural network unit.
In this embodiment, the state of the first offline model is determined, so that the first offline model is more accurate to execute.
In one embodiment, the processor may determine an execution mode between each equivalent node according to a dependency relationship of each equivalent node in the equivalent network; if the equivalent nodes have no dependency relationship, executing the equivalent nodes in parallel; if the equivalent nodes have dependency relationship, executing the equivalent nodes in sequence.
Alternatively, the way in which the processor executes the respective nodes while running the original network may also be performed in the manner described above.
In this embodiment, if the scenario is a demand scenario with a low delay, each equivalent node in the equivalent network may be executed fully in parallel, that is, in parallel at the model level. If the requirement scenario is high throughput, each equivalent node needs to be executed sequentially, and the nodes are run in parallel for multiple copies, that is, in parallel at the data level. Various reasoning requirements can be operated, and the method can meet the requirements of low delay and high throughput.
In one embodiment, as shown in fig. 17, there is provided a data processing apparatus including: a second obtaining module 400 and a second executing module 500, wherein:
a second obtaining module 400, configured to obtain a recurrent neural network node.
A second executing module 500, configured to circularly call the first offline model corresponding to the recurrent neural network node, and run the recurrent neural network node according to the first offline model.
In one embodiment, the second obtaining module 400 is further configured to obtain an original network including the recurrent neural network node; the second executing module 500 is further configured to, if the current node in the original network is an acyclic neural network node, obtain weight data and instruction data of the current node from a second offline model corresponding to the original network, and directly operate the current node according to the weight data and the instruction data of the current node; and the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
In one embodiment, the data processing apparatus further comprises an equivalence module and a determination module; the equivalent module is used for determining an equivalent network corresponding to the original network according to the original network, wherein the equivalent network comprises at least one equivalent cyclic neural network node and at least one equivalent acyclic neural network node; the determining module is used for determining the execution sequence of each equivalent node in the equivalent network according to the dependency relationship of each equivalent node in the equivalent network corresponding to the original network; the second executing module 500 is further configured to, if the current equivalent node is an equivalent acyclic neural network node, obtain weight data and instruction data of the current equivalent node from a second offline model corresponding to the original network, and directly operate the current equivalent node according to the weight data and the instruction data of the current equivalent node.
In one embodiment, the second execution module 500 is further configured to: if the equivalent nodes in the equivalent network have no dependency relationship, executing the equivalent nodes in parallel; and if the dependency relationship exists among all equivalent nodes in the equivalent network, executing the equivalent nodes according to the dependency relationship.
In one embodiment, the equivalent module comprises a second obtaining unit, an updating unit and an equivalent unit; the second obtaining unit is configured to obtain at least one recurrent neural network node in the original network and a connected slice of the non-recurrent neural network, where the connected slice of the non-recurrent neural network includes at least one non-recurrent neural network node; the updating unit is used for updating the connection relation of each acyclic neural network node in the acyclic neural network connection piece according to the dependency relation of each node in the original network to obtain an updated acyclic neural network connection piece; the equivalent unit is used for respectively equivalent the updated communicating piece of the non-cyclic neural network into an equivalent non-cyclic neural network node; the equivalent unit is further configured to determine, according to the dependency relationship of each node in the original network, the dependency relationship of each equivalent acyclic neural network node and the equivalent cyclic neural network node, and obtain an equivalent network corresponding to the original network.
For specific limitations of the data processing apparatus, reference may be made to the above limitations of the data processing method, which are not described herein again. The various modules in the data processing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, the present application further provides a computer system comprising a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to perform the method according to any one of the above embodiments. Specifically, when the processor executes the computer program, the following steps are specifically executed:
and acquiring the nodes of the recurrent neural network. Specifically, the processor obtains a recurrent neural network node and operates the recurrent neural network node.
And circularly calling a first offline model corresponding to the recurrent neural network node, and operating the recurrent neural network node according to the first offline model. Specifically, after the processor acquires the recurrent neural network node, the processor circularly calls a first offline model corresponding to the recurrent neural network node, and operates the recurrent neural network node according to the first offline model.
In an embodiment, there is also provided a computer storage medium having a computer program stored therein, which when executed by one or more processors performs the method of any of the above embodiments. The computer storage media may include, among other things, non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (14)

1. A method of data processing, the method comprising:
acquiring a recurrent neural network node, wherein the recurrent neural network node comprises at least one recurrent neural network unit; wherein the recurrent neural network unit comprises an input layer, a hidden layer, and an output layer,
circularly calling a first offline model corresponding to the recurrent neural network unit, and operating the recurrent neural network node according to the first offline model;
wherein the first offline model comprises weight data and instruction data of a single recurrent neural network unit;
the method further comprises the following steps:
acquiring an original network containing the recurrent neural network nodes;
if the current node in the original network is an acyclic neural network node, acquiring weight data and instruction data of the current node from a second offline model corresponding to the original network, and directly operating the current node according to the weight data and the instruction data of the current node;
and the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
2. The method of claim 1, further comprising:
determining an equivalent network corresponding to the original network according to the original network, wherein the equivalent network comprises at least one equivalent cyclic neural network node and at least one equivalent acyclic neural network node;
determining an execution sequence of each equivalent node in the equivalent network according to the dependency relationship of each equivalent node in the equivalent network corresponding to the original network;
and if the current equivalent node is an equivalent non-cyclic neural network node, acquiring weight data and instruction data of the current equivalent node from a second offline model corresponding to the original network, and directly operating the current equivalent node according to the weight data and the instruction data of the current equivalent node.
3. The method of claim 2, further comprising:
if the equivalent nodes in the equivalent network have no dependency relationship, executing the equivalent nodes in parallel;
and if the dependency relationship exists among all equivalent nodes in the equivalent network, executing the equivalent nodes according to the dependency relationship.
4. The method according to claim 2, wherein the step of determining the equivalent network corresponding to the original network according to the original network comprises:
obtaining at least one cyclic neural network node in the original network and a communicating sheet of an acyclic neural network, wherein the communicating sheet of the acyclic neural network is the acyclic neural network nodes which are connected together, and the communicating sheet of the acyclic neural network comprises at least one acyclic neural network node;
updating the connection relation of each acyclic neural network node in the acyclic neural network connection piece according to the dependency relation of each node in the original network to obtain an updated acyclic neural network connection piece;
respectively equating the updated communicating pieces of the non-cyclic neural network to be equivalent non-cyclic neural network nodes;
and determining the dependency relationship of each equivalent acyclic neural network node and the equivalent cyclic neural network node according to the dependency relationship of each node in the original network to obtain the equivalent network corresponding to the original network.
5. The method of claim 4, wherein the step of obtaining at least one of the recurrent neural network nodes and a patch of non-recurrent neural network in the original network comprises:
determining an input node and an output node of each cyclic neural network node according to the dependency relationship of each node in the original network;
and disconnecting the input node and the output node of the recurrent neural network node from the recurrent neural network node to obtain at least one recurrent neural network node and a communication sheet of the non-recurrent neural network, wherein the communication sheet of the non-recurrent neural network comprises at least one non-recurrent neural network node.
6. The method according to claim 4, wherein the step of updating the connection relationship of each acyclic neural network node in the acyclic neural network connection slice according to the dependency relationship of each node in the original network to obtain an updated acyclic neural network connection slice comprises:
respectively judging whether each non-cyclic neural network node in a communication sheet of the non-cyclic neural network depends on the output result of the cyclic neural network node according to the dependency relationship of each node in the original network;
if the non-cyclic neural network nodes in the communication sheet of the non-cyclic neural network depend on the output result of the cyclic neural network nodes, disconnecting the input nodes of the non-cyclic neural network nodes from the non-cyclic neural network nodes to obtain the updated communication sheet of the non-cyclic neural network.
7. The method of any of claims 1-6, wherein if the first offline model is stateful, the first offline model further comprises state input data; the step of circularly calling a first off-line model corresponding to the recurrent neural network node and operating the recurrent neural network node according to the first off-line model further comprises the following steps:
acquiring weight data, instruction data and state input data of the current cyclic neural network unit from the first offline model;
operating the cyclic neural network unit according to the weight data, the instruction data and the state input data of the single cyclic neural network unit;
and storing the current output result of the recurrent neural network unit as state input data to the first offline model, and then returning to the step of acquiring weight data, instruction data and input data of the current recurrent neural network unit from the first offline model until the operation of the recurrent neural network node is completed.
8. A data processing apparatus, characterized in that the apparatus comprises:
the second acquisition module is used for acquiring a recurrent neural network node, and the recurrent neural network node comprises at least one recurrent neural network unit; wherein the recurrent neural network unit comprises an input layer, a hidden layer, and an output layer,
the second execution module is used for circularly calling the first off-line model corresponding to the recurrent neural network node and operating the recurrent neural network node according to the first off-line model; wherein the first offline model comprises weight data and instruction data of a single recurrent neural network unit;
the second obtaining module is further configured to obtain an original network including the recurrent neural network node;
the second execution module is further configured to, if a current node in the original network is an acyclic neural network node, obtain weight data and instruction data of the current node from a second offline model corresponding to the original network, and directly operate the current node according to the weight data and the instruction data of the current node; and the second offline model comprises weight data and instruction data of each acyclic neural network node in the original network.
9. The apparatus of claim 8, further comprising an equivalence module and a determination module;
the equivalent module is used for determining an equivalent network corresponding to the original network according to the original network, wherein the equivalent network comprises at least one equivalent cyclic neural network node and at least one equivalent acyclic neural network node;
the determining module is configured to determine an execution sequence of each equivalent node in the equivalent network according to a dependency relationship of each equivalent node in the equivalent network corresponding to the original network;
the second execution module is further configured to, if the current equivalent node is an equivalent acyclic neural network node, obtain weight data and instruction data of the current equivalent node from a second offline model corresponding to the original network, and directly operate the current equivalent node according to the weight data and the instruction data of the current equivalent node.
10. The apparatus of claim 9, wherein the execution module is further configured to:
if the equivalent nodes in the equivalent network have no dependency relationship, executing the equivalent nodes in parallel;
and if the dependency relationship exists among all equivalent nodes in the equivalent network, executing the equivalent nodes according to the dependency relationship.
11. The apparatus of claim 8, wherein the equivalence module comprises a second obtaining unit, an updating unit, and an equivalence unit; wherein,
the second obtaining unit is configured to obtain at least one recurrent neural network node in the original network and a connection piece of the non-recurrent neural network, where the connection piece of the non-recurrent neural network is the non-recurrent neural network node connected together, and the connection piece of the non-recurrent neural network includes at least one non-recurrent neural network node;
the updating unit is used for updating the connection relation of each acyclic neural network node in the acyclic neural network connection piece according to the dependency relation of each node in the original network to obtain an updated acyclic neural network connection piece;
the equivalent unit is configured to respectively equivalent the updated connected pieces of the acyclic neural network into an equivalent acyclic neural network node, and further configured to determine, according to the dependency relationship of each node in the original network, the dependency relationship between each equivalent acyclic neural network node and the equivalent cyclic neural network node, so as to obtain an equivalent network corresponding to the original network.
12. A computer system comprising a processor and a memory, the memory having stored therein a computer program that, when executed by the processor, performs the method of any one of claims 1-7.
13. The computer system of claim 12, wherein the processor package operation unit and the controller unit; the arithmetic unit includes: a master processing circuit and a plurality of slave processing circuits;
the controller unit is used for acquiring input data and instructions;
the controller unit is further configured to analyze the instruction to obtain a plurality of instruction data, and send the plurality of instruction data and the input data to the main processing circuit;
the main processing circuit is used for performing preamble processing on the input data and transmitting data and instruction data with the plurality of slave processing circuits;
the plurality of slave processing circuits are used for executing intermediate operation in parallel according to the data and the instruction data transmitted from the master processing circuit to obtain a plurality of intermediate results and transmitting the plurality of intermediate results to the master processing circuit;
and the main processing circuit is used for executing subsequent processing on the plurality of intermediate results to obtain the result of the instruction.
14. A computer storage medium, having stored thereon a computer program which, when executed by one or more processors, performs the method of any one of claims 1-7.
CN201811568921.2A 2018-12-21 2018-12-21 Data processing method, device, computer system and storage medium Active CN109685203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811568921.2A CN109685203B (en) 2018-12-21 2018-12-21 Data processing method, device, computer system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811568921.2A CN109685203B (en) 2018-12-21 2018-12-21 Data processing method, device, computer system and storage medium

Publications (2)

Publication Number Publication Date
CN109685203A CN109685203A (en) 2019-04-26
CN109685203B true CN109685203B (en) 2020-01-17

Family

ID=66188573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811568921.2A Active CN109685203B (en) 2018-12-21 2018-12-21 Data processing method, device, computer system and storage medium

Country Status (1)

Country Link
CN (1) CN109685203B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200084099A (en) * 2019-01-02 2020-07-10 삼성전자주식회사 Neural network optimizing device and neural network optimizing method
CN112070213A (en) * 2020-08-28 2020-12-11 Oppo广东移动通信有限公司 Neural network model optimization method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096727A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 A kind of network model based on machine learning building method and device
CN107895190A (en) * 2017-11-08 2018-04-10 清华大学 The weights quantization method and device of neural network model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4388033B2 (en) * 2006-05-15 2009-12-24 ソニー株式会社 Information processing apparatus, information processing method, and program
US10387769B2 (en) * 2016-06-30 2019-08-20 Samsung Electronics Co., Ltd. Hybrid memory cell unit and recurrent neural network including hybrid memory cell units
CN106529669A (en) * 2016-11-10 2017-03-22 北京百度网讯科技有限公司 Method and apparatus for processing data sequences
CN107103113B (en) * 2017-03-23 2019-01-11 中国科学院计算技术研究所 The Automation Design method, apparatus and optimization method towards neural network processor
CN108734288B (en) * 2017-04-21 2021-01-29 上海寒武纪信息科技有限公司 Operation method and device
CN107766939A (en) * 2017-11-07 2018-03-06 维沃移动通信有限公司 A kind of data processing method, device and mobile terminal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096727A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 A kind of network model based on machine learning building method and device
CN107895190A (en) * 2017-11-08 2018-04-10 清华大学 The weights quantization method and device of neural network model

Also Published As

Publication number Publication date
CN109685203A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN108446761B (en) Neural network accelerator and data processing method
CN109492241B (en) Conversion method, conversion device, computer equipment and storage medium
JP7264376B2 (en) How to generate a general-purpose trained model
CN109993287B (en) neural network processing method, computer system, and storage medium
CN109685203B (en) Data processing method, device, computer system and storage medium
CN110717584A (en) Neural network compiling method, compiler, computer device, and readable storage medium
CN111831359A (en) Weight precision configuration method, device, equipment and storage medium
CN111831355A (en) Weight precision configuration method, device, equipment and storage medium
KR20190141581A (en) Method and apparatus for learning artificial neural network for data prediction
CN110865950B (en) Data preprocessing method and device, computer equipment and storage medium
CN113449842A (en) Distributed automatic differentiation method and related device
CN116805157B (en) Unmanned cluster autonomous dynamic evaluation method and device
JP7299846B2 (en) Neural network processing method, computer system and storage medium
CN112970037B (en) Multi-chip system for implementing neural network applications, data processing method suitable for multi-chip system, and non-transitory computer readable medium
US20220222927A1 (en) Apparatus, system, and method of generating a multi-model machine learning (ml) architecture
WO2023050807A1 (en) Data processing method, apparatus, and system, electronic device, and storage medium
EP4052188B1 (en) Neural network instruction streaming
CN109993288B (en) Neural network processing method, computer system, and storage medium
CN109726797B (en) Data processing method, device, computer system and storage medium
US20140006321A1 (en) Method for improving an autocorrector using auto-differentiation
CN111274023B (en) Data processing method, device, computer system and storage medium
KR20210061800A (en) Method of generating sparse neural networks and system therefor
CN110865792B (en) Data preprocessing method and device, computer equipment and storage medium
CN115081628B (en) Method and device for determining adaptation degree of deep learning model
CN110704040A (en) Information processing method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant