CN110163337B - Data processing method, device and equipment based on neural network and storage medium - Google Patents

Data processing method, device and equipment based on neural network and storage medium Download PDF

Info

Publication number
CN110163337B
CN110163337B CN201811340948.6A CN201811340948A CN110163337B CN 110163337 B CN110163337 B CN 110163337B CN 201811340948 A CN201811340948 A CN 201811340948A CN 110163337 B CN110163337 B CN 110163337B
Authority
CN
China
Prior art keywords
layer
data
hidden
network
controlling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811340948.6A
Other languages
Chinese (zh)
Other versions
CN110163337A (en
Inventor
周谦
周方云
詹成君
方允福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201811340948.6A priority Critical patent/CN110163337B/en
Publication of CN110163337A publication Critical patent/CN110163337A/en
Application granted granted Critical
Publication of CN110163337B publication Critical patent/CN110163337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a data processing method, a device, equipment and a storage medium based on a neural network, belonging to the technical field of data processing, wherein the neural network comprises at least one network merging layer, the network merging layer comprises n cascaded hidden layers, and n is more than or equal to 2; the data processing method comprises the following steps: and controlling the network merging layer to perform data processing, wherein interlayer parallel processing of data exists among hidden layers of the network merging layer in the data processing process. By the scheme of the embodiment of the invention, the data processing efficiency can be effectively improved.

Description

Data processing method, device and equipment based on neural network and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, device, and storage medium based on a neural network.
Background
With the continuous development of neural network technology, neural networks have been widely used in various fields. For example, convolutional neural networks have been widely used in the fields of computer vision and image processing by virtue of their unique performance advantages, and have achieved good results in recent years in terms of visual recognition.
However, the neural network has the characteristics of large capacity, high dimensionality and the like, and the network parameters of the neural network are numerous, so that the problem of long operation time exists when data processing is performed based on the neural network, and how to improve the data processing efficiency is a problem to be solved urgently.
Disclosure of Invention
The embodiment of the invention mainly aims to provide a data processing method, a data processing device, data processing equipment and a storage medium based on a neural network, so as to solve the problem of low data processing speed in the existing data processing mode.
In a first aspect, an embodiment of the present invention provides a data processing method based on a neural network, where the neural network includes at least one network merging layer, the network merging layer includes n cascaded hidden layers, and n is greater than or equal to 2; the data processing method comprises the following steps:
and controlling the network merging layer to perform data processing, wherein interlayer parallel processing of data exists among hidden layers of the network merging layer in the data processing process.
In an optional embodiment of the first aspect, when the n cascaded hidden layers are instantiated, the n cascaded hidden layers correspond to the same object instance.
In an optional embodiment of the first aspect, controlling the network merging layer to perform data processing includes:
controlling an i-1 hidden layer of the network merging layer, processing input data of the i-1 hidden layer, and serially outputting a processing result, wherein i is more than or equal to 2 and less than or equal to n;
and controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer.
In an optional embodiment of the first aspect, controlling the ith hidden layer to process the output data of the (i-1) th hidden layer includes:
and when the output data meet the preset conditions, controlling the ith hidden layer to process the output data.
In an optional embodiment of the first aspect, the preset condition includes that the output data satisfies a lowest operation condition for performing intra-layer operation on the ith hidden layer.
In an alternative embodiment of the first aspect, the outputted data is part of the output data of the (i-1) th hidden layer.
In an optional embodiment of the first aspect, the data processing method further comprises:
at least a portion of the data controlling the outputted data is stored in a register and/or Cache (Cache).
In an optional embodiment of the first aspect, controlling an ith hidden layer of the network merging layer to process output data of an i-1 st hidden layer includes:
when the maximum output duration of the output data is not greater than the set duration, controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer;
the maximum output duration refers to a duration between the current time and the obtaining time of the data obtained earliest in the output data.
In an optional embodiment of the first aspect, the set time duration is determined according to a maximum storage time duration of the register temporary storage data and/or a maximum storage time duration of the Cache data.
In an optional embodiment of the first aspect, the neural network is a first convolutional neural network, the network merging layer includes a first convolutional layer and a first activation function (Relu) layer that are cascaded, and the controlling network merging layer performs data processing, and includes:
controlling the first convolution layer to carry out convolution operation on input data of the first convolution layer, serially outputting an operation result of each time of the first convolution layer, and controlling the first Relu layer to carry out Relu operation on output data of each time of the first convolution layer;
or,
the neural network is the second convolution neural network, and the network merging layer includes cascaded second convolution layer, batch Normalization layer, scale translation (Scale) layer and the second Relu layer in proper order, and control network merging layer carries out data processing, includes:
controlling the second convolution layer to carry out convolution operation on input data of the second convolution layer and serially outputting the operation result of the second convolution layer each time;
controlling the Batch Normalization layer to perform Batch Normalization operation on the output data of the second convolution layer each time, and serially outputting the operation result of the Batch Normalization layer each time;
controlling the Scale layer to carry out Scale operation on output data of the Batch Normalization layer each time, serially outputting an operation result of the Scale layer each time, and controlling the second Relu layer to carry out Relu operation on the output data of the Scale layer each time;
or,
the neural network is a third convolutional neural network, the network merging layer comprises any cascaded hidden layer and an element-by-element operation (Eltwise) layer, the output data of any hidden layer is the input data of the Eltwise layer, and the network merging layer is controlled to perform data processing, and the method comprises the following steps:
controlling any hidden layer to perform corresponding operation on input data of any hidden layer, and serially outputting the operation result of any hidden layer each time;
and controlling the Eltwise layer to perform Eltwise operation on the output data of any hidden layer and the output data of other hidden layers, wherein the output data of other hidden layers are input data of the Eltwise layer.
In a second aspect, an embodiment of the present invention provides a data processing apparatus based on a neural network, where the neural network includes at least one network merging layer, the network merging layer includes n cascaded hidden layers, and n is greater than or equal to 2; the data processing apparatus includes:
and the data processing module is used for controlling the network merging layer to perform data processing, wherein interlayer parallel processing of data exists among all hidden layers of the network merging layer in the data processing process.
In an optional embodiment of the second aspect, when the n cascaded hidden layers are instantiated, the n cascaded hidden layers correspond to the same object instance.
In an optional embodiment of the second aspect, the data processing module is specifically configured to:
controlling an i-1 hidden layer of the network merging layer, processing input data of the i-1 hidden layer, and serially outputting a processing result, wherein i is more than or equal to 2 and less than or equal to n;
and controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer.
In an optional embodiment of the second aspect, when controlling the ith hidden layer to process the output data of the (i-1) th hidden layer, the data processing module is specifically configured to:
and when the output data meet the preset conditions, controlling the ith hidden layer to process the output data.
In an optional embodiment of the second aspect, the preset condition includes that the output data satisfies a lowest operation condition for performing intra-layer operation on the ith hidden layer.
In an alternative embodiment of the second aspect, the outputted data is part of the output data of the (i-1) th hidden layer.
In an optional embodiment of the second aspect, the data processing module is further configured to:
at least a portion of the data controlling the outputted data is stored in a register and/or a Cache.
In an optional embodiment of the second aspect, when controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer, the data processing module is specifically configured to:
when the maximum output duration of the output data is not greater than the set duration, controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer;
the maximum output duration refers to a duration between the current time and the obtaining time of the data obtained earliest in the output data.
In an optional embodiment of the second aspect, the set time length is determined according to a maximum storage time length of the temporary storage data of the register and/or a maximum storage time length of the Cache data.
In an optional embodiment of the second aspect, the neural network is a first convolutional neural network, the network merging layer includes a first convolutional layer and a first Relu layer that are cascaded, and the data processing module is specifically configured to, when controlling the network merging layer to perform data processing:
and controlling the first convolution layer to carry out convolution operation on input data of the first convolution layer, serially outputting an operation result of each time of the first convolution layer, and controlling the first Relu layer to carry out Relu operation on output data of each time of the first convolution layer.
In an optional embodiment of the second aspect, the neural network is a second convolutional neural network, the network merging layer includes a second convolutional layer, a Batch Normalization layer, a Scale layer, and a second Relu layer, which are sequentially cascaded, and the data processing module is specifically configured to, when controlling the network merging layer to perform data processing:
controlling the second convolution layer to carry out convolution operation on input data of the second convolution layer and serially outputting the operation result of the second convolution layer each time;
controlling the Batch Normalization layer to perform Batch Normalization operation on output data of the second convolution layer each time, and serially outputting operation results of the Batch Normalization layer each time;
and controlling the Scale layer to perform Scale operation on output data of the Batch Normalization layer each time, serially outputting an operation result of the Scale layer each time, and controlling the second Relu layer to perform Relu operation on the output data of the Scale layer each time.
In an optional embodiment of the second aspect, the neural network is a third convolutional neural network, the network merging layer includes any hidden layer and an Eltwise layer that are cascaded, output data of any hidden layer is input data of the Eltwise layer, and the data processing module is specifically configured to, when controlling the network merging layer to perform data processing:
controlling any hidden layer to perform corresponding operation on input data of any hidden layer, and serially outputting each operation result of any hidden layer;
and controlling the Eltwise layer to perform Eltwise operation on the output data of any hidden layer and the output data of other hidden layers, wherein the output data of other hidden layers are input data of the Eltwise layer.
In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes a processor and a memory; the memory has stored therein readable instructions which, when loaded and executed by the processor, implement the data processing method as described above in the first aspect or any one of the optional embodiments of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a readable instruction is stored, and when the readable instruction is loaded and executed by a processor, the data processing method in the first aspect or any optional embodiment of the first aspect is implemented.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
in the data processing method, device, equipment and storage medium based on the neural network provided by the embodiment of the invention, in the process of data processing through the network merging layer of the neural network, the data interlayer parallel processing is carried out in the data processing process by controlling each cascaded hidden layer of the network merging layer, so that the synchronous processing of data can exist between different hidden layers of the merging layer. Compared with the prior art, the method and the device have the advantages that the hidden layer of the next layer does not need to wait until the hidden layer of the previous layer finishes the processing of all data in the plurality of cascaded hidden layers of the network merging layer, the processing of the data is started, the time overhead of data processing can be effectively reduced, and the processing efficiency of the data is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below.
FIGS. 1a, 1b and 1c are schematic structural diagrams of three existing neural network structures, respectively;
FIG. 2 is a schematic diagram illustrating a network merging layer according to an example of the present invention;
fig. 3a and 3b show schematic diagrams of two input structures of a network merging layer in an example of the invention, respectively;
FIG. 4a is a schematic diagram of a prior art neural network architecture;
FIG. 4b is a schematic diagram of a network merging layer corresponding to the neural network structure in FIG. 4 a;
FIG. 5a is a schematic diagram of a prior art neural network architecture;
FIG. 5b is a schematic diagram of a network merging layer corresponding to the neural network structure of FIG. 5 a;
FIG. 6a is a schematic diagram of a prior art neural network architecture;
FIG. 6b is a schematic diagram of a network merging layer corresponding to the neural network structure of FIG. 6 a;
FIG. 6c is a schematic diagram of another network merging layer corresponding to the neural network structure of FIG. 6 a;
FIG. 6d is a schematic diagram of a network merging layer corresponding to the neural network structure in FIG. 6 a;
fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The following describes the technical solution of the present invention and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
The neural network is an operational model, which is formed by connecting a large number of nodes (or called neurons). The whole neural network can be divided into an input layer, a hidden layer and an output layer, and one neural network can contain one or more hidden layers. The input layer is the first layer of the neural network and is responsible for receiving original input data of the network and transmitting the received input data to the hidden layer, wherein the hidden layer is responsible for required calculation and outputting a calculation result to the output layer, the output layer is the last layer of the neural network and is responsible for receiving the final input of the hidden layer, and the expected number of values in an ideal range can be obtained through the output layer, so that the final processing result of the neural network is obtained.
The general term L-layer neural network generally means that the neural network has L hidden layers, and the input layer and the output layer can not be counted. For example, a common Conv (convolution) layer, a Relu (activated function) layer, a POOL (Pooling) layer, a Batch Normalization/BN (Batch Normalization) layer, a Scale (zoom translation) layer, an Eltwise (operate by element) layer, and the like are all hidden layers in the neural network.
Due to the specific nonlinear adaptive information processing capability of the neural network, the neural network is successfully applied to the fields of pattern recognition, intelligent control, combination optimization, prediction and the like. In recent years, a neural network has been developed more deeply on the way of simulating human cognition, and has become an important direction of AI (Artificial Intelligence). However, the existing data processing mode based on the neural network has the problems of long operation time and low data processing efficiency, and the high-efficiency data processing requirement in practical application cannot be well met.
Taking the convolutional neural network as an example, due to the excellent performance of the convolutional neural network in visual recognition in recent years, the convolutional neural network is increasingly applied to various game scenes, such as MOBA (Multiplayer Online Battle Arena) games. In MOBA games, a plurality of problems that a player is disconnected, hung up and continuous failure of the player causes strong frustration of the player and the like often occur, in order not to influence the experience of other players, the disconnected player and the hung player can be managed through AI, or in order to pacify the continuous failure player, the continuous failure player can be enabled to fight against the AI with weaker fighting capability so as to achieve the purpose of pacifying the continuous failure player.
For many MOBA handtours, the calculation real-time performance is improved, and prediction calculation such as AI hosting and the like can be carried out at a mobile terminal, so that the cost is reduced, and the calculation real-time performance is improved. At the mobile end, because a GPU (Graphics Processing Unit) needs to be used for rendering a game and resources are limited, AI inference computation at the mobile end usually occupies limited CPU (Central Processing Unit) resources and memory resources, and AI needs to be able to quickly compute an instruction of an operation of a hosting player, so that a higher requirement is provided for an AI forward computation inference framework at the mobile end.
When a convolutional neural network model is processed, a corresponding object is created for each hidden layer of the convolutional neural network, the output of a previous hidden layer is used as the input of a next hidden layer, the processing of data of the next hidden layer needs to wait until all the processing results of the previous hidden layer are output, but when the next hidden layer acquires the output data of the previous hidden layer for calculation, the data is usually not in a register or even not in a cache (high-speed buffer memory), the data needs to be transferred from a memory to the cache and then loaded to the register from the cache, and the mode can generate large time overhead, thereby affecting the forward calculation speed of the whole neural network. In addition, because most of the output data of the previous hidden layer exists in the memory, the existing data calculation method based on the neural network also has the problem of occupying too large memory resources, which affects the overall performance of the mobile terminal.
The schematic structure of partial layer network structures of three different prior convolutional neural network models as shown in fig. 1a, fig. 1b and fig. 1c, wherein the input of the Eltwise layer in fig. 1c is connected with the output of two layers (layers), and the input data is the output data of any layer1 and any layer2 shown in the figure. When an existing deep learning inference framework processes structures shown in three figures of fig. 1a, 1b and 1c in a convolutional neural network model, calculation is performed according to structures of each layer in an original model structure, a corresponding object is created for each layer, the input of the previous layer is used as the input of the next layer, and data between the layers is transmitted through a memory. For example, in the network configuration shown in fig. 1a, the Conv layer and the Relu layer correspond to respective objects, and when data processing is performed based on the network configuration, it is necessary to initialize and call the objects corresponding to the two layers, respectively. Because each hidden layer of the neural network has a long time for completing all processing of data, most or all of the output data of the Conv layer are transferred to the memory when the data processing needs to be performed through the Relu layer, so that the time overhead of data loading is large, and the data processing efficiency is influenced.
In addition, as can be seen from the comparison among fig. 1a, fig. 1b and fig. 1c, the more the number of hidden layers of the neural network is, the more time is consumed in loading the interlayer data.
In order to solve at least one technical problem existing in the existing data processing based on a neural network, the embodiment of the invention provides a data processing method based on the neural network, which adopts an interlayer merging mode to perform interlayer merging on some hidden layers in the neural network, and reduces the time overhead of interlayer data loading during data processing and improves the data processing efficiency by controlling the interlayer parallel processing of data existing between all hidden layers in the merged network merging layers in the data processing process. In addition, the method of the embodiment of the invention can effectively reduce the occupation of the memory resource. The following further illustrates specific embodiments provided by the examples of the present invention.
In the data processing method based on the neural network provided in the embodiment of the present invention, the neural network includes at least one network merging layer, as shown in fig. 2, the network merging layer may include n cascaded hidden layers, where n is greater than or equal to 2, that is, each network merging layer includes at least two cascaded hidden layers; the data processing method comprises the following steps:
and controlling the network merging layer to perform data processing, wherein interlayer parallel processing of data exists between hidden layers of the network merging layer in the data processing process.
The interlayer parallel processing refers to parallel processing of data between different hidden layers, and in the process of data processing through a network merging layer of a neural network, parallel processing of data can exist in at least two hidden layers of the cascaded n hidden layers.
For example, a network merging layer of a neural network includes a Conv layer and a Relu layer, input data of the Relu layer is output data of the Conv layer, and in the process of data processing through the network merging layer, parallel processing of data exists between the Conv layer and the Relu layer, that is, when the start of the data processing process of the Relu layer starts before the Conv layer completes processing of all the Conv layer data, each time the Conv layer completes processing of part of the Conv layer data to obtain input data required by the operation of the Relu layer, the processed part of the data of the Conv layer is controlled to be input into the Relu layer to start the operation of the Relu layer, so that the Conv layer and the Relu layer both perform data processing.
Hidden layer concatenation, which means that the output of a previous hidden layer is connected to the input of a subsequent hidden layer, and the input data of the subsequent hidden layer depends on the output data of the previous hidden layer. For example, for a convolutional neural network, the network merging layer thereof may include 2 hidden layers, such as a Conv layer and a Relu layer, input data of the Relu layer is output data of the Conv layer, and the Conv layer and the Relu layer are cascaded.
It should be noted that, in practical application, the number of the network merging layers in the neural network and the number of the hidden layers included in a single network merging layer may be configured according to practical application requirements, practical application scenarios, and the attributes of each hidden layer of the neural network. For example, two or more hidden layers with relatively small data processing amount can be used as a network merging layer; for another example, when data processing is performed through the neural network, in an application scenario with a relatively small data processing amount, the network merging layer may include a relatively large number of hidden layers, and in an application scenario with a relatively large data amount, the network merging layer may include a relatively small number of hidden layers.
For the network merging layer, the input data of the network merging layer comprises the input data of the first hidden layer of the network merging layer, and the output data of the network merging layer is the output data of the last hidden layer. In addition, the input data source of the network merging layer may also be different based on the difference of the hidden layer included in the network merging layer itself. For example, as shown in fig. 3a, in a schematic diagram of a partial structure of a neural network, when the network merging layer includes a first hidden layer of the neural network, an input of the network merging layer is an output of an input layer of the neural network, that is, raw input data of the neural network. When the network merging layer does not include the first hidden layer of the neural network, the input of the network merging layer includes the input data of the first hidden layer included in the network merging layer itself, for example, in the schematic diagram of the partial structure of the neural network shown in fig. 3b, the input data of the first hidden layer of the network merging layer is the output data of the last hidden layer connected to the hidden layer, and the data is the input data of the network merging layer.
Similarly, the output data of the network merging layer may be the output data of the neural network, i.e., the input data of the output layer of the neural network, or may be the input data of the next hidden layer of the neural network (the hidden layer connected to the last hidden layer of the network merging layer).
As an example, fig. 4a shows a schematic diagram of a partial network structure of an existing neural network, which includes a Conv layer and a Relu layer in cascade, where an Input of the network structure is Input1 of the Conv layer, and an Output of the network structure is Output1 of the Relu layer; fig. 4b shows a schematic structural diagram of a network merging layer (Conv + Relu merging layer shown in the figure) obtained by interlayer merging the Conv layer and the Relu layer of the network structure shown in fig. 4a based on a method in an embodiment of the present invention, that is, the network merging layer shown in fig. 4b includes a Conv layer and a Relu layer, and has functions of both Conv calculation and Relu calculation. The Input of the network merging layer is the Input1 of the original Conv layer shown in fig. 4a, and the output of the network merging layer is the output Input1 of the original Relu layer shown in the original 4 a.
As another example, fig. 5a shows a schematic diagram of a partial network structure of an existing neural network, which includes a Conv layer, a Batch Normal layer, a Scale layer and a Relu layer, which are sequentially cascaded, where an Input of the network structure is Input1 of the Conv layer, and an Output of the network structure is Output1 of the Relu layer; fig. 5b shows a schematic structural diagram of a network merged layer (Conv + batchnormal + Scale + Relu merged layer shown in the figure) obtained by interlayer merging the Conv layer, the Batch Normal layer, the Scale layer and the Relu layer of the network structure shown in fig. 5a, based on the method of the embodiment of the present invention, where the network merged layer has the functions of the 4 cascaded hidden layers shown in fig. 5a, its Input is Input1 of the original Conv layer shown in fig. 5a, and its Output is Output1 of the original Relu layer shown in fig. 5 b.
As still another example, fig. 6a shows a schematic diagram of a partial network structure of an existing neural network, which includes any hidden layer1 and any hidden layer2, where the outputs of layer1 and layer2 are the inputs of an Eltwise layer, and the Output2 of the Eltwise layer is the Output of the network structure; fig. 6b, fig. 6c, and fig. 6d respectively show schematic structural diagrams of a network merged Layer obtained by interlayer merging of Layer1, layer2, and Eltwise layers of the network structure shown in fig. 6a, such as Layer2+ Eltwise merged Layer shown in fig. 6b, layer1+ Eltwise merged Layer shown in fig. 6c, and Layer1+ Layer2+ Eltwise Layer shown in fig. 6d, according to the method of the embodiment of the present invention. For the network merging Layer shown in fig. 6d, which includes 2 cascaded hidden layers, layer1 and Eltwise layers are cascaded, and Layer2 and Eltwise layers are also cascaded.
As can be seen from fig. 6b, 6c, and 6d, any one or both of layer1 and layer2 and the eltwise layer may be inter-layer merged to obtain the network merged layer. As shown in fig. 6b, the Eltwise calculation performed by the Eltwise Layer can be performed by being placed in Layer2, that is, the Eltwise Layer and Layer2 are combined to obtain a Layer2+ Eltwise combined Layer, layer1 processes its Input data Input1, output1 of Layer1 is used as the Input of the network combined Layer, input2 before the Layer2 combined Layer is still used as the Input of the Layer2+ Eltwise combined Layer, and the Output of the Layer2+ Eltwise combined Layer is the Output of the original Eltwise Layer. In fig. 6c, the Eltwise calculation performed by the Eltwise Layer is performed by being placed in Layer1, that is, the Eltwise Layer and Layer1 are combined to obtain a Layer1+ Eltwise combined Layer, layer2 processes Input data Input2 thereof, output1 of Layer2 is used as the Input of the network combined Layer, and Input1 before the Layer1 is combined is still used as the Input of the Layer1+ Eltwise combined Layer. In FIG. 6d, layers of layer1, layer2 and Eltwise are combined.
In the data processing method of the embodiment of the invention, in the process of carrying out data processing through a plurality of cascade hidden layers included in the network merging layer of the neural network, the different hidden layers of the merging layer can carry out data processing partially synchronously by controlling the parallel processing of the data among the layers of the cascade hidden layers in the data processing process. Compared with the existing method for processing data through a neural network, the hidden layer of the next layer in the plurality of cascaded hidden layers of the network merging layer does not need to wait until the hidden layer of the previous layer finishes processing all data, so that the time delay of data processing can be effectively reduced, and the data processing efficiency is improved.
For example, for the network merging layer shown in fig. 4b, the Relu layer is merged into the Conv layer for calculation, and when data calculation is performed based on the network merging layer, the network merging layer performs Conv calculation first to obtain a corresponding output data, and at this time, the network merging layer can be controlled to perform Relu correlation operation immediately, so that the network merging layer can perform Conv calculation and Relu correlation operation in parallel based on the Conv layer and the Relu layer included in the network merging layer, thereby improving the data processing efficiency.
For the network merging layer shown in fig. 5b, the Batch Normal layer, the Scale layer, and the Relu layer are merged into the Conv layer for calculation, and when data calculation is performed based on the network merging layer, the network merging layer performs Conv calculation first to obtain a corresponding output data, and at this time, the Batch Normal correlation operation may be immediately performed on the output data, and then the Scale and the Relu correlation operation may be performed.
For the network merging layer shown in fig. 6b, 6c and 6d, when forward data calculation is performed based on the network merging layer shown in fig. 6b/6c, taking fig. 6c as an example, when one output data is generated in the network merging layer according to the calculation in the original layer2, the output data and the output of layer1 can be subjected to Eltwise calculation immediately. When forward data calculation is performed based on the network merging layer shown in fig. 6d, and both the calculations in layer1 and layer2 generate one output data, eltwise calculation may be performed on the data calculated by layer1 and layer2 immediately.
When data processing is performed based on the network configuration corresponding to fig. 6d, the calculation of layer1 and the calculation of layer2 in the network merging layer may be controlled to be either serial calculation or parallel calculation, as necessary. The serial calculation corresponds to the calculation of layer1 and layer2 described in fig. 6b and fig. 6c, and for example, after the layer1 calculation is completed, the layer2 calculation is started, i.e., the layer1 calculation and the layer2 calculation are performed simultaneously.
In an optional embodiment of the present invention, when n cascaded hidden layers are instantiated, the n cascaded hidden layers correspond to the same object instance.
Instantiation refers to the process of creating objects with classes in object-oriented programming, i.e. creating objects of a class. To create and use an object, it is necessary to complete the loading of a class and the instantiation of the class, where the loading of the class is to load the class into a memory in advance, and the instantiation of the class is a process from the class to a specific object, and the object is a specific instance and includes a set of attributes and methods (a set of codes that complete a certain function). In the data processing process, the method of the object is called by calling the object, and the function corresponding to the method of the object is realized.
In the embodiment of the invention, in order to better ensure that interlayer parallel processing of data exists between the hidden layers of the network merging layer during data processing, the hidden layers of the network merging layer are instantiated into one object, so that initialization of the hidden layers of the network merging layer can be simultaneously completed when the object is called, and loading of attribute information and methods of the hidden layers is completed, so that the ith hidden layer of the network merging layer can process the data at any time based on output data of the (i-1) th hidden layer, and the efficiency of data processing is improved.
In an optional embodiment of the present invention, controlling the network merging layer to perform data processing may specifically include:
controlling an i-1 hidden layer of the network merging layer, processing input data of the i-1 hidden layer, and serially outputting a processing result, wherein i is more than or equal to 2 and less than or equal to n;
and controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer.
The output data refers to data processed by the i-1 th hidden layer, i.e., data processed by the i-1 st layer, and for example, for the Conv layer, the output data of the Conv layer is data after the Conc calculation. The above-mentioned serial output of the processing result means that the data processed first, i.e. calculated first, is output first, and the data processed later is output later, rather than outputting the data of the current layer in a unified manner after all the data are calculated.
When the processing is carried out through the network merging layer, for each hidden layer, after each operation is finished, the data output of the operation is controlled to the next hidden layer, so that the next hidden layer can carry out the processing of the data quickly, the operation does not need to be started after the previous hidden layer finishes all the operations, and the processing efficiency of the network merging layer on the data is improved. In addition, because the data which is operated and completed by the (i-1) th hidden layer is output first and the data can participate in the data operation process of the (i) th hidden layer immediately, the output data of the (i-1) th hidden layer which is depended by the current operation of the (i) th hidden layer is positioned in the register of the terminal equipment with a very high probability, and the register has a very high read-write speed, the data can be read quickly when the data processing is performed through the (i) th hidden layer, and the data processing efficiency is effectively improved.
In an optional embodiment of the present invention, controlling the ith hidden layer to process the output data of the (i-1) th hidden layer includes:
and when the output data meet the preset conditions, controlling the ith hidden layer to process the output data.
The preset condition may be configured according to one or more of different application scenarios, actual application requirements, attribute information of each hidden layer included in the network merging layer, and information such as a duration for which data can be temporarily stored in the register. In an optional embodiment of the present invention, the preset condition may include: the output data meets the lowest operation condition of the ith hidden layer for in-layer operation.
The minimum operation condition for the hidden layer to perform the in-layer operation refers to the minimum data required by at least one neuron of the hidden layer to start performing the operation when performing the data operation through the hidden layer. The minimum operation condition may be the same or different for different hidden layers.
If the minimum operation condition for performing the in-layer operation by the Relu layer of the network merging layer is one output data of the Conv layer (an operation result after completing one Conv operation), the Relu layer can immediately perform the Relu operation based on the output data when the Conv layer performs the Conv operation to obtain a corresponding output data.
It is understood that the preset condition may include other configured conditions besides that the output data satisfies the lowest operation condition for performing the intra-layer operation in the ith hidden layer.
In an alternative embodiment of the present invention, the output data is partial output data of the (i-1) th hidden layer.
In order to ensure that interlayer parallel processing of data exists between a plurality of hidden layers of a network merging layer when data processing is performed through the network merging layer, the ith hidden layer needs to be capable of starting data operation based on partial output data of the (i-1) th hidden layer. That is, the lowest operation condition for the i-th hidden layer to perform intra-layer operation depends on partial output data of the i-1 th hidden layer, not the entire output data of the i-th hidden layer.
In an optional embodiment of the present invention, the data processing method may further include:
and controlling at least part of data of the output data to be stored in a register and/or a Cache.
The register is an internal element of the CPU and has very high read-write speed. Cache, i.e., cache memory, is a memory with a small capacity but a high speed between the CPU and the main memory. The access rates of the data in the register, the Cache and the memory are sequentially from high to low. When data processing is performed, if data to be read is located in a memory, the data needs to be loaded into the Cache from the memory first, and then loaded into the register from the Cache, which may cause a large time overhead and a low data processing efficiency.
In order to improve the processing efficiency of data, in the embodiment of the present invention, when data processing is performed through the network merging layer, part or all of the data in the output data of the i-1 th hidden layer, on which the operation of the i-th hidden layer depends, may be stored in the register and/or the Cache by controlling the i-th hidden layer, so as to further reduce the time overhead caused by data loading and improve the processing efficiency.
In practical application, which two or more hidden layers are merged into one network merging layer can be determined according to the attribute information of each hidden layer, the data amount required to be calculated and processed by each hidden layer, and the attributes of the register and/or the Cache of the electronic device, which are included in the neural network, so that the output data of the (i-1) th hidden layer are all located in the register or are all located in the register and the Cache, the loading time of the data is reduced to the greatest extent, and the processing efficiency is improved.
As shown in fig. 4b, after the input data based on the Conv layer is subjected to the Conv calculation to obtain a corresponding output data, the Conv layer of the network merging layer may immediately perform Relu-related operation on the input data because the data is still in the register at this time, so that the input data subjected to the Relu calculation is all obtained from the register. In addition, by adopting the data processing method of the embodiment of the invention, the memory occupied during data storage can be greatly reduced, so that the memory resource occupied during data processing can be effectively reduced while the calculation delay is reduced.
In an optional embodiment of the present invention, controlling an ith hidden layer of a network merging layer to process output data of an (i-1) th hidden layer includes:
when the maximum output duration of the output data is not greater than the set duration, controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer;
the maximum output duration refers to a duration between the current time and the obtaining time of the data obtained earliest in the output data, that is, a duration between the obtaining time of the data output earliest (the time when the data is obtained by calculation of the (i-1) th hidden layer) and the current time in the output data of the (i-1) th hidden layer.
In practical application, the set time length can be preset to realize the control of the storage position of the output data of the i-1 st layer, and the set time length can be determined according to an empirical value and/or an experimental value.
In an optional embodiment of the present invention, the set time duration may be determined according to a maximum storage time duration of the register for temporarily storing the data, and/or a maximum storage time duration of the Cache data.
The maximum storage duration of the temporary storage data of the register refers to the temporary storage time of the data in the register, and also can be understood as the time for transferring the data from the register to the Cache if the data is not applied after entering the register.
In practical application, the output data can be controlled to be all located in the register according to the maximum storage duration of the temporary storage data of the register, and the data can be controlled to be all located in the Cache or both located in the Cache and the register by setting the duration according to the maximum storage duration of the temporary storage data of the Cache.
It should be noted that, in practical applications, the condition that the maximum output time length of the output data is not greater than the set time length may be configured alone, or may be configured in the preset condition, that is, the preset condition may include that the maximum output time length of the output data is not greater than the set time length.
In an optional embodiment of the present invention, the neural network may be a first convolutional neural network, the network merging layer includes a first convolutional layer and a first Relu layer that are cascaded, and the control network merging layer performs data processing, which may specifically include:
and controlling the first convolution layer to carry out convolution operation on input data of the first convolution layer, serially outputting an operation result of each time of the first convolution layer, and controlling the first Relu layer to carry out Relu operation on output data of each time of the first convolution layer.
Specifically, for example, when a mobile-end inference frame Caffe2 or tensoflate processes a network structure shown in fig. 4a in a convolutional neural network model, when a forward computation frame loads a model structure to create a graph structure, a Conv layer and a Relu layer may be merged into one layer structure, that is, only one layer object instance is created, an input of an original Conv layer is used as an input of the merged layer (Conv + Relu merged layer shown in fig. 4 b), an output of the merged layer is used as an output of an original Relu layer, and when performing forward inference computation subsequently, the merged layer performs Conv computation first to obtain data of a corresponding output.
In an optional embodiment of the present invention, the neural network is a second convolutional neural network, the network merging layer includes a second convolutional layer, a Batch Normalization layer, a scaling translation Scale layer, and a second Relu layer, which are sequentially cascaded, and the controlling of the network merging layer to perform data processing specifically may include:
controlling the second convolution layer to carry out convolution operation on input data of the second convolution layer and serially outputting the operation result of the second convolution layer each time;
controlling the Batch Normalization layer to perform Batch Normalization operation on the output data of the second convolution layer each time, and serially outputting the operation result of the Batch Normalization layer each time;
and controlling the Scale layer to perform Scale operation on the output data of the Batch Normalization layer each time, serially outputting the operation result of the Scale layer each time, and controlling the second Relu layer to perform Relu operation on the output data of the Scale layer each time.
Specifically, for example, when the mobile side inference framework Caffe2 or the tensoflate processes the network structure shown in fig. 5a in the convolutional neural network model, when the forward computation framework loads the model structure to create the graph structure, the Conv layer, the Batch Normalization layer, the Scale layer, and the Relu layer may be merged into one layer structure, only one layer object instance is created, the merged network merging layer (Conv + Batch normal + Scale + Relu merging layer shown in fig. 5 b) uses the input of the original Conv layer as its input, and its output is used as the output of the original Relu layer, and when the subsequent forward inference computation is performed, the network merging layer performs Conv computation first to obtain a corresponding output data, and since the data is still in the register at this time, the Batch normal correlation operation may be performed on the data, and then the Scale and Relu operations are performed, so that the consumption time of the register for acquiring the data at the next computation in the merging layer may be reduced.
In an optional embodiment of the present invention, the neural network may be a third convolutional neural network, the network merging layer includes any hidden layer and an Eltwise layer that are cascaded, output data of any hidden layer is input data of the Eltwise layer, and the control network merging layer performs data processing, which may specifically include:
controlling any hidden layer to perform corresponding operation on input data of any hidden layer, and serially outputting the operation result of any hidden layer each time;
and controlling the Eltwise layer to perform Eltwise operation on the output data of any hidden layer and the output data of other hidden layers, wherein the output data of other hidden layers are input data of the Eltwise layer.
In the scheme, any hidden layer and other hidden layers are previous-stage hidden layers cascaded with an Eltwise layer, and the Eltwise layer is used for performing fusion operation on output data of any hidden layer and output data of other hidden layers, such as product (dot product), sum (addition and subtraction), max (maximum value) and other operations. It is understood that the other hidden layer may be one or more. For example, for the network structure shown in fig. 6a, the Eltwise layer is used to perform Eltwise operation on the output data of two hidden layers (layer 1 and layer2 shown in the figure), in this case, one of layer1 and layer2 corresponds to any hidden layer in the scheme, and the other corresponds to the other hidden layer. The following description will be given taking the network configuration shown in fig. 6 as an example.
Specifically, when the mobile-side inference frame Caffe2 or tensoflute is used to process the network structure shown in fig. 6a in the convolutional neural network model, layer2 and Eltwise layers may be merged into one Layer structure (such as Layer2+ Eltwise merged Layer shown in fig. 6 b) when the model structure is loaded into the forward computation frame to create a graph structure, and only one Layer object instance is created, or Layer2 and Eltwise layers are merged into one Layer structure (such as Layer2+ Eltwise merged Layer shown in fig. 6 c), and taking the network merged Layer shown in fig. 6b as an example, when the output of Layer1 is used as the input of the network merged Layer, the input before the Layer2 is still used as the input of the network merged Layer, and when an output data is generated in the network merged Layer according to the computation in original Layer2, the output data is immediately obtained from the register of Layer2 in comparison with the output data of Layer1, and the operation of the Eltwise Layer2 may be directly performed before the operation of the Layer object and the Eltwise Layer object.
The data processing method of the embodiment of the invention can be applied to various modules or products for data processing based on the neural network, and particularly can be applied to various terminal devices (including client devices, servers and the like) and is realized by a processor of the terminal device. For example, the method can be applied to mobile terminal equipment (such as a smart phone and the like) and is implemented by a processor (such as a CPU) of the mobile terminal equipment.
The solution of the embodiment of the present invention is particularly suitable for a mobile terminal device, because, compared to a terminal device such as a computer and a server, the present invention is limited by various reasons such as a system architecture, a process, a cost, and a hardware resource configuration (CPU, GPU, memory, etc.), and the processor performance (mainly including the operation capability of the processor) of the mobile terminal device such as a mobile phone is far lower than that of the device such as a computer, and particularly, the difference in the floating point operation capability is usually several thousand to ten thousand times. In addition, because the scheme of the embodiment of the invention can also effectively reduce the occupation of the memory, when the scheme of the embodiment of the invention is applied to the mobile terminal equipment, the effect on the aspects of improving the operation speed of data and the like can be more remarkable, and the problems of low data processing efficiency, unsmooth application program and the like caused by various aspects of the mobile terminal equipment can be effectively improved. Specifically, for example, when the scheme of the embodiment of the present invention is applied to an MOBA game of a mobile terminal based on a neural network architecture, the data processing efficiency can be effectively improved, and the occurrence of memory overflow (Out of memory) and the like can be reduced.
The data processing method provided by the embodiment of the invention can be applied to various structures, devices, products and models for processing data by the neural network. The following further describes aspects of embodiments of the present invention with reference to specific examples.
Example 1
In this example, the data processing method provided by the embodiment of the present invention may be applied to the application scenario described above, in which the AI hosting is implemented by an AI forward computational inference framework including a convolutional neural network in the MOBA game. Specifically, the method can be applied to forward calculation of the convolutional neural network inference corresponding to the AI macro and AI micro-operation. The AI local view refers to controlling the position of the AI to be moved on the whole map, such as tower guarding, line clearing, wild opening, grab, support, etc., taking the royal as an example. AI micro-manipulation refers to controlling AI specific operations, such as walking, skill release, etc.
For the three structures shown in fig. 4a, fig. 5a and fig. 6a in the AI big overview model and the AI micro-manipulation model, based on the method of the embodiment of the present invention, when the network instance is created by the forward computation framework, the interlayer merging computation of the hidden layer may be performed. Specifically, two hidden layers shown in fig. 4a may be merged into a network merged layer shown in fig. 4b, a plurality of hidden layers shown in fig. 5a may be merged into a network merged layer shown in fig. 5b, and a plurality of hidden layers shown in fig. 6a may be merged into a network merged layer shown in fig. 6b, 6c, or 6 d.
For the offline hosting AI of the MOBA game, when a certain player is offline or no related operation exceeds a given threshold time, the server may select one or more users with better mobile phone performance according to the acquired terminal device (e.g., mobile phone) hardware configuration resources of each game player from the beginning of the game, operate the AI in the mobile phones of the users, and perform forward prediction calculation by the mobile phones of the users. In this example, the input data during prediction is game frame data of the current player, that is, image data of a current image corresponding to a game, an AI large-field view model and an AI micro-manipulation model are used for performing related forward prediction calculation on the input data, and a calculation result is sent to the server, and the server sends out a related manipulation instruction according to the calculation result to drive the hosted player to perform related displacement and skill release. For the MOBA game with 15 operation frames/second, 1 major calculation is performed every 15 frames, and 1 micro calculation is performed every 2 frames.
Taking the forward computing framework with the network structure shown in fig. 4b as an example, by using the data processing method of the embodiment of the present invention, when the AI macro-local view and AI micro-operation model structure is loaded on the forward computing framework to create a graph structure, the Conv layer and the Relu layer are merged into one layer structure, that is, only one layer object example is created, the Input1 of the original Conv layer (i.e., the Conv layer shown in fig. 4a before merging) is used as its Input and its Output is used as the Output1 of the original Relu layer by the new merged structure, that is, the Conv + Relu merged layer shown in fig. 4b after merging, and during the subsequent forward reasoning calculation, the network merged layer performs Conv calculation first to obtain data of a corresponding Output. Compared with the existing forward computing framework based on the network structure shown in fig. 4a, the computing efficiency can be greatly improved, so that the efficiency that the server sends out related control instructions according to the computing result to drive the hosted player to perform related displacement and skill release is improved, and the high-performance requirement of data processing is better met.
Similarly, based on the forward computing framework with the network structure shown in fig. 5b, when the data processing method according to the embodiment of the present invention is used for performing forward reasoning computation, during the forward reasoning computation, the Conv + Batch Normal + Scale + Relu merge layer performs Conv computation to obtain a corresponding output data, and since the data is still in the register at this time, the Batch Normal correlation operation can be immediately performed on the output data, and then Scale and Relu correlation operations are performed, so that the data required for performing the next computation each time in the network merge layer can be obtained in the register. Based on the forward computing framework with the network structure shown in fig. 6b, when the data processing method of the embodiment of the invention is adopted for forward reasoning computation, and the Layer2+ eltwise merging Layer generates output data according to computation in the Layer2 Layer, the eltwise computation can be immediately performed on the output data and output of the Layer1 Layer, compared with the method that the data of the Layer2 is directly obtained from the register before the combining Layer is performed, the loading time overhead of the data can be effectively reduced, and the processing efficiency of the data is improved.
Through experiments, online statistical data show that when the data processing method of the embodiment of the invention is adopted to carry out AI inference calculation and prediction calculation in an MOBA game AI macro-view model and an AI micro-manipulation model with three network merging layer structures shown in fig. 4b, fig. 5b and fig. 6b (or fig. 6 c), the average calculation time consumption is increased from 31ms to 23ms, the acceleration ratio is 1.35, the average calculation time consumption of the AI micro-manipulation model is increased from 2ms to 1ms, the acceleration ratio is 2 times, and the corresponding AI hosting rate is increased from 27% to 38%. Therefore, the scheme provided by the embodiment of the invention can improve the data processing efficiency.
Example two
As an example, table 1 and table 2 respectively show a comparison table of time consumption and memory consumption of two schemes when image recognition is performed by using the existing data processing method based on the neural network and the data processing method of the present invention. The comparison results shown in tables 1 and 2 are obtained under the same conditions except that the neural network structures used are different when the two schemes are used for image recognition.
Table 1 shows the time-consuming comparison results of the two schemes when performing image data calculation by using the conventional method and the method according to the embodiment of the present invention in image recognition based on the mobilenet neural network model including the network structure shown in fig. 5a, and the time-consuming comparison results of the two schemes when performing image recognition based on the squzeenet neural network model including the network structure shown in fig. 4a, respectively.
Table 2 shows the memory consumption comparison results of the two schemes when performing image data calculation by using the conventional method and the method according to the embodiment of the present invention in image recognition based on the mobilenet neural network model including the network structure shown in fig. 5a, and the memory consumption comparison results of the two schemes when performing image recognition based on the squzeenent neural network model including the network structure shown in fig. 4 a. In tables 1 and 2, the combination layer corresponds to a scheme using the neural network structure according to the embodiment of the present invention, and the non-combination layer corresponds to a scheme using the existing neural network structure.
mobilenet squzeenet
Composite layer 110.66 120.03
Non-combined layer 126.90 129.17
TABLE 1 time consuming comparison (in ms)
mobilenet squzeenet
Composite layer 75.23 76.68
Non-laminated layer 114.54 83.29
TABLE 2 consumed memory comparison (in MB (mega))
As can be seen from table 1, the calculated time-consuming acceleration ratio was 1.15 (126.90/110.66) for the mobilenet-based image recognition model after the layer was formed (layer was formed) as compared with before the layer was formed (layer was not formed), and was 1.08 (129.17/120.03) for the squeezenet-based image recognition model after the layer was formed as compared with before the layer was formed.
As can be seen from Table 2, in terms of memory usage, the image recognition model based on the mobilenet saves 34.3% of memory (114.54-75.23)/114.54 after the combination layer compared with before the combination layer, and the image recognition model based on the squeezenet saves 7.9% of memory after the combination layer compared with before the combination layer.
Example three
Table 3 of this example shows the time consumption comparison result when performing prediction calculation based on the mobilenet neural network and using the data processing method according to the embodiment of the present invention and using the existing caffe2 and TensorFlowLite inference framework, and the time consumption comparison result when performing prediction calculation based on the squeezenet neural network and using the data processing method according to the embodiment of the present invention and using the existing caffe2 and TensorFlowLite inference framework.
mobilenet squzeenet
Caffe2 327.82 187.79
Composite layer 117.66 124.03
TensorFlowLite 176.11 252.21
TABLE 3 time consuming comparison (in ms)
As can be seen from Table 3, when the scheme of the embodiment of the present invention is adopted based on the mobilenet neural network, the acceleration ratio of the calculation time to the existing caffe2 is 2.78 (327.82/117.66), and the acceleration ratio to the existing TensorFlowLite is 1.50 (187.79/124.03); based on the squzeenet neural network, by adopting the scheme of the embodiment of the invention, the acceleration ratio of the calculation time to the caffe2 is 1.51, and the acceleration ratio to the TensorFlowLite is 2.03.
It is understood that the solution of the embodiment of the present invention can be applied to various structures, models, and products that apply a neural network structure for data processing, and is not limited to the above-mentioned application fields or application scenarios, such as the corresponding control model applied in the above-mentioned MOBA game, and the solution can be extended to products that use any neural network structure, such as products that have any convolutional neural network structure with the above-mentioned structures of fig. 1a, 1b, and 1c, for example, a classical neural network-based image recognition model vgg16, mobilenet, squeezet, etc.
Based on the same principle as the data processing method based on the neural network provided by the embodiment of the invention, the embodiment of the invention also provides a data processing device based on the neural network, wherein the neural network comprises at least one network merging layer, the network merging layer comprises n cascaded hidden layers, and n is more than or equal to 2.
The data processing device comprises a data processing module, wherein the data processing module is used for controlling the network merging layer to process data, and interlayer parallel processing of data exists among all hidden layers of the network merging layer in the data processing process.
The data processing module of the embodiment of the invention can be applied to various electronic devices, for example, mobile terminal devices, fixed terminal devices, and servers, and the function of the data processing module can be specifically realized by the control of a processor of the electronic device.
It is to be understood that the above modules of the data processing apparatus in the embodiments of the present disclosure have functions of implementing corresponding steps in the data processing method shown in any embodiment of the present disclosure, and the functions may be implemented by hardware or by hardware executing corresponding software, where the hardware or software includes one or more modules corresponding to the above functions. The modules can be realized independently or by integrating a plurality of modules. For the detailed functional description of the data processing apparatus, reference may be made to the corresponding description in the foregoing data processing method, which is not described herein again.
In an optional embodiment of the present invention, when the n cascaded hidden layers are instantiated, the n cascaded hidden layers correspond to the same object instance
In an optional embodiment of the present invention, the data processing module may be specifically configured to:
controlling an i-1 hidden layer of the network merging layer, processing input data of the i-1 hidden layer, and serially outputting a processing result, wherein i is more than or equal to 2 and less than or equal to n;
and controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer.
In an optional embodiment of the present invention, when the data processing module controls the ith hidden layer to process the output data of the (i-1) th hidden layer, the data processing module may specifically be configured to:
and when the output data meet the preset conditions, controlling the ith hidden layer to process the output data.
In an optional embodiment of the present invention, the preset condition includes that the output data satisfies a minimum operation condition for performing intra-layer operation on the ith hidden layer.
In an alternative embodiment of the invention the outputted data is part of the output data of the (i-1) th hidden layer.
In an optional embodiment of the invention, the data processing module is further configured to:
at least part of the data controlling the output data is stored in the register and/or Cache.
In an optional embodiment of the present invention, when the data processing module controls the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer, the data processing module is specifically configured to:
when the maximum output duration of the output data is not greater than the set duration, controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer;
the maximum output duration refers to a duration between the current time and the obtaining time of the data obtained earliest in the output data.
In an optional embodiment of the invention, the set time length is determined according to the maximum storage time length of the temporary storage data of the register and/or the maximum storage time length of the Cache data.
In an optional embodiment of the present invention, the neural network is a first convolutional neural network, the network merging layer includes a first convolutional layer and a first Relu layer that are cascaded, and the data processing module is specifically configured to, when controlling the network merging layer to perform data processing:
and controlling the first convolution layer to carry out convolution operation on input data of the first convolution layer, serially outputting an operation result of each time of the first convolution layer, and controlling the first Relu layer to carry out Relu operation on output data of each time of the first convolution layer.
In an optional embodiment of the present invention, the neural network is a second convolutional neural network, the network merging layer includes a second convolutional layer, a Batch Normalization layer, a Scale layer, and a second Relu layer, which are sequentially cascaded, and the data processing module is specifically configured to, when controlling the network merging layer to perform data processing:
controlling the second convolution layer to carry out convolution operation on input data of the second convolution layer, and serially outputting operation results of the second convolution layer each time;
controlling the Batch Normalization layer to perform Batch Normalization operation on the output data of the second convolution layer each time, and serially outputting the operation result of the Batch Normalization layer each time;
and controlling the Scale layer to perform Scale operation on output data of the Batch Normalization layer each time, serially outputting an operation result of the Scale layer each time, and controlling the second Relu layer to perform Relu operation on the output data of the Scale layer each time.
In an optional embodiment of the present invention, the neural network is a third convolutional neural network, the network merging layer includes any hidden layer and an Eltwise layer that are cascaded, output data of any hidden layer is input data of the Eltwise layer, and the data processing module is specifically configured to, when controlling the network merging layer to perform data processing:
controlling any hidden layer to perform corresponding operation on input data of any hidden layer, and serially outputting each operation result of any hidden layer;
and controlling the Eltwise layer to carry out Eltwise operation on the output data of any hidden layer and the output data of other hidden layers, wherein the output data of other hidden layers are the input data of the Eltwise layer.
Since the data processing apparatus provided in the embodiment of the present invention is an apparatus capable of executing the data processing method in the embodiment of the present invention, based on the data processing method provided in the embodiment of the present invention, a person skilled in the art can understand a specific implementation of the data processing apparatus in the embodiment of the present invention and various modifications thereof, and therefore, how to implement the data processing method in the embodiment of the present invention by the data processing apparatus is not described in detail herein. The device used for data processing in the embodiments of the present invention is not limited to the scope of the present application.
Based on the same principle as the data processing method and the data processing apparatus provided by the embodiment of the present invention, an embodiment of the present invention also provides an electronic device, which may include a processor and a memory. The memory stores therein readable instructions, which when loaded and executed by the processor, can implement the data processing method shown in any embodiment of the present invention.
The embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores readable instructions, and when the readable instructions are loaded and executed by a processor, the data processing method shown in any embodiment of the present invention is implemented.
Fig. 7 is a schematic structural diagram of an electronic device to which an embodiment of the present invention is applicable, and as shown in fig. 7, the electronic device 2000 includes a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to the memory 2003, such as via bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present invention.
The processor 2001 is applied to the embodiment of the present invention, and is configured to implement the function of the data processing module in the embodiment of the present invention. The transceiver 2004 includes a receiver and a transmitter, and the transceiver 2004 is applied to the embodiment of the present invention to realize communication between the electronic device 2000 and another device, and to realize reception and transmission of data.
The processor 2001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.
Bus 2002 may include a path that conveys information between the aforementioned components. The bus 2002 may be a PCI bus or an EISA bus, etc. The bus 2002 may be divided into an address bus, a data bus, a control bus, and so on. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The memory 2003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
Optionally, the memory 2003 is adapted to store application program code for performing aspects of the present invention and is controlled in execution by the processor 2001. The processor 2001 is used to execute application program code stored in the memory 2003 to carry out the actions of the apparatus provided by the embodiments of the present invention.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and embellishments can be made without departing from the principle of the present invention, and these should also be construed as the scope of the present invention.

Claims (10)

1. A data processing method based on a neural network is characterized in that the neural network comprises at least one network merging layer, the network merging layer comprises n cascaded hidden layers, and n is more than or equal to 2; the method comprises the following steps:
controlling the (i-1) th hidden layer of the network merging layer, processing input data of the (i-1) th hidden layer, and serially outputting a processing result, wherein i is more than or equal to 2 and is less than or equal to n;
controlling at least part of output data of the (i-1) th hidden layer to be stored in a register and/or a Cache;
when the output data meet preset conditions, controlling an ith hidden layer of the network merging layer to process the output data, wherein interlayer parallel processing of the data exists among hidden layers of the network merging layer in the data processing process;
wherein the preset condition is determined according to at least one of the following: attribute information of each hidden layer included in the network merging layer, the temporary storage time of data in a register of the mobile terminal equipment or the temporary storage time of data in a Cache of the mobile terminal equipment;
wherein the n hidden layers included in the network merging layer are determined according to at least one of the following: the attribute information of each hidden layer, the data amount required to be calculated and processed by each hidden layer, the attribute of a register of the mobile terminal equipment or the attribute of a Cache of the mobile terminal equipment.
2. The method according to claim 1, wherein the n cascaded hidden layers correspond to a same object instance when instantiated.
3. The method according to claim 1, wherein the preset condition comprises that the outputted data satisfies a lowest operation condition for performing an intra-layer operation on the ith hidden layer.
4. A method according to any of claims 1 to 3, wherein the outputted data is part of the output data of the i-1 hidden layer.
5. The method of claim 1, wherein the preset duration comprises that the maximum output duration of the output data is not greater than a set duration; when the output data meets a preset condition, controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer, including:
when the maximum output time of the output data is not longer than a set time, controlling the ith hidden layer of the network merging layer to process the output data of the (i-1) th hidden layer;
the maximum output duration refers to a duration between the current time and the obtaining time of the data obtained earliest in the output data.
6. The method according to claim 5, wherein the set time duration is determined according to a maximum storage time duration of temporary storage data of the register and/or a maximum storage time duration of Cache data.
7. The method according to any one of claims 1 to 5, wherein the neural network is a first convolutional neural network, the network merging layer comprises a first convolutional layer and a first activation function Relu layer which are cascaded, and the controlling the network merging layer for data processing comprises:
controlling the first convolution layer to carry out convolution operation on input data of the first convolution layer, serially outputting an operation result of the first convolution layer each time, and controlling a first Relu layer to carry out Relu operation on output data of the first convolution layer each time;
or,
the neural network is the second convolution neural network, the network merging layer is including the second convolution layer, batch Normalization layer, scale layer and the second Relu layer of Normalization that cascade in proper order, control the network merging layer carries out data processing, include:
controlling the second convolution layer to carry out convolution operation on input data of the second convolution layer, and serially outputting operation results of the second convolution layer each time;
controlling the Batch Normalization layer to perform Batch Normalization operation on output data of the second convolution layer each time, and serially outputting operation results of the Batch Normalization layer each time;
controlling the Scale layer to perform Scale operation on output data of the Batch Normalization layer each time, serially outputting operation results of the Scale layer each time, and controlling the second Relu layer to perform Relu operation on the output data of the Scale layer each time;
or,
the neural network is a third convolutional neural network, the network merging layer comprises any cascaded hidden layer and an Eltwise layer operated according to elements, the output data of any hidden layer is the input data of the Eltwise layer, and the control of the network merging layer for data processing comprises the following steps:
controlling any hidden layer to perform corresponding operation on the input data of any hidden layer, and serially outputting the operation result of any hidden layer each time;
and controlling the Eltwise layer to carry out Eltwise operation on the output data of any hidden layer and the output data of other hidden layers, wherein the output data of other hidden layers are the input data of the Eltwise layer.
8. A data processing device based on a neural network is characterized in that the neural network comprises at least one network merging layer, the network merging layer comprises n cascaded hidden layers, and n is more than or equal to 2; the device comprises:
the data processing module is used for controlling the (i-1) th hidden layer of the network merging layer, processing the input data of the (i-1) th hidden layer and serially outputting a processing result, wherein i is more than or equal to 2 and less than or equal to n;
the data processing module is used for controlling at least part of output data of the (i-1) th hidden layer to be stored in a register and/or a Cache memory;
the data processing module is used for controlling the ith hidden layer of the network merging layer to process the output data when the output data meet preset conditions, wherein interlayer parallel processing of the data exists among all the hidden layers of the network merging layer in the data processing process;
wherein the preset condition is determined according to at least one of the following: attribute information of each hidden layer included in the network merging layer, the temporary storage time length of data in a register of the mobile terminal equipment or the temporary storage time length of data in a Cache of the mobile terminal equipment;
wherein the n hidden layers included in the network merging layer are determined according to at least one of the following: the attribute information of each hidden layer, the data amount required to be calculated and processed by each hidden layer, the attribute of a register of the mobile terminal equipment or the attribute of a Cache of the mobile terminal equipment.
9. An electronic device, wherein the electronic device comprises a processor and a memory;
the memory has stored therein readable instructions which, when loaded and executed by the processor, implement the data processing method of any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon readable instructions, which when loaded and executed by a processor, carry out a data processing method according to any one of claims 1 to 7.
CN201811340948.6A 2018-11-12 2018-11-12 Data processing method, device and equipment based on neural network and storage medium Active CN110163337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811340948.6A CN110163337B (en) 2018-11-12 2018-11-12 Data processing method, device and equipment based on neural network and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811340948.6A CN110163337B (en) 2018-11-12 2018-11-12 Data processing method, device and equipment based on neural network and storage medium

Publications (2)

Publication Number Publication Date
CN110163337A CN110163337A (en) 2019-08-23
CN110163337B true CN110163337B (en) 2023-01-20

Family

ID=67645220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811340948.6A Active CN110163337B (en) 2018-11-12 2018-11-12 Data processing method, device and equipment based on neural network and storage medium

Country Status (1)

Country Link
CN (1) CN110163337B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554164A (en) * 2020-04-24 2021-10-26 上海商汤智能科技有限公司 Neural network model optimization method, neural network model data processing method, neural network model optimization device, neural network model data processing device and storage medium
CN112948126A (en) * 2021-03-29 2021-06-11 维沃移动通信有限公司 Data processing method, device and chip

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018120016A1 (en) * 2016-12-30 2018-07-05 上海寒武纪信息科技有限公司 Apparatus for executing lstm neural network operation, and operational method
CN107844833A (en) * 2017-11-28 2018-03-27 郑州云海信息技术有限公司 A kind of data processing method of convolutional neural networks, device and medium
CN108074211B (en) * 2017-12-26 2021-03-16 浙江芯昇电子技术有限公司 Image processing device and method
CN108491924B (en) * 2018-02-11 2022-01-07 江苏金羿智芯科技有限公司 Neural network data serial flow processing device for artificial intelligence calculation
CN108446758B (en) * 2018-02-11 2021-11-30 江苏金羿智芯科技有限公司 Artificial intelligence calculation-oriented neural network data serial flow processing method

Also Published As

Publication number Publication date
CN110163337A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
RU2771008C1 (en) Method and apparatus for processing tasks based on a neural network
US10943324B2 (en) Data processing method, apparatus, and electronic device
CN109597965B (en) Data processing method, system, terminal and medium based on deep neural network
CN110163337B (en) Data processing method, device and equipment based on neural network and storage medium
US20190114541A1 (en) Method and system of controlling computing operations based on early-stop in deep neural network
CN109242094A (en) Device and method for executing artificial neural network forward operation
CN104102522B (en) The artificial emotion driving method of intelligent non-player roles in interactive entertainment
CN109214508B (en) System and method for signal processing
CN114281521B (en) Method, system, equipment and medium for optimizing deep learning heterogeneous resource communication efficiency
CN111931901A (en) Neural network construction method and device
CN113241064A (en) Voice recognition method, voice recognition device, model training method, model training device, electronic equipment and storage medium
CN108681773A (en) Accelerated method, device, terminal and the readable storage medium storing program for executing of data operation
CN112732436B (en) Deep reinforcement learning acceleration method of multi-core processor-single graphics processor
CN110795235A (en) Method and system for deep learning and cooperation of mobile web
CN111352896B (en) Artificial intelligence accelerator, equipment, chip and data processing method
CN110600020A (en) Gradient transmission method and device
CN113139608B (en) Feature fusion method and device based on multi-task learning
US20180060731A1 (en) Stage-wise mini batching to improve cache utilization
CN114519425A (en) Convolution neural network acceleration system with expandable scale
CN114020654A (en) Depth separable convolution acceleration system and method
CN109389216B (en) Dynamic cutting method and device of neural network and storage medium
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
CN116828541A (en) Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning
CN111652051B (en) Face detection model generation method, device, equipment and storage medium
CN114330675A (en) Chip, accelerator card, electronic equipment and data processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant