CN115809699B

CN115809699B - Method and device for estimating minimum memory occupation amount required by neural network model reasoning

Info

Publication number: CN115809699B
Application number: CN202310052812.XA
Authority: CN
Inventors: 李超
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-06-23
Anticipated expiration: 2043-02-03
Also published as: CN115809699A

Abstract

The invention discloses a method and a device for estimating minimum memory occupation amount required by neural network model reasoning, and belongs to the field of neural network application. The method uses graph theory as a core idea, the calculation logic of the neural network is described by a directed acyclic graph, the reasoning process of the neural network model is mapped into the topological ordering process of the graph, and then the search tree is pruned to obtain the estimation of the minimum memory occupation amount and the corresponding operator execution sequence. The invention provides information support for the operation and design of the neural network model on the edge equipment, and contributes to the intelligence of the edge equipment.

Description

Method and device for estimating minimum memory occupation amount required by neural network model reasoning

Technical Field

The invention belongs to the technical field of neural network application, and particularly relates to a method and a device for estimating minimum memory occupation amount required by neural network model reasoning.

Background

In recent years, the rapid development of the neural network field attracts a great deal of attention, and related application results are endless. For example, face recognition techniques may be applied to everyday punching cards, and image recognition and semantic segmentation techniques may be applied to personnel security monitoring. These techniques play a vital role in our lives, however, their use in perfectly landed applications still faces a series of challenges.

Firstly, in the application of the neural network which is mature at present, the data to be inferred are all kinds of sensing data collected by a camera and sensing equipment, the data are transmitted to a remote server through a network path, intelligent inference is carried out on the data on the server, an inference result is returned to an edge equipment end through a network, and the edge equipment carries out further processing according to the result. The disadvantage of this approach is that the overall reasoning process is time consuming and is highly susceptible to network stability. To reduce inference time, the neural network model may be run directly on the edge computing device. However, edge computing devices differ significantly from servers, taking the industry using a widely used STM32F7 microcontroller as an example, with a maximum on-chip RAM memory of 512KB, which means that the minimum total memory footprint required by the neural network model in the reasoning process on this type of microcontroller must not exceed 512KB.

Different edge devices have different memory limitations, different neural network model structures are different, and in order to judge whether the neural network can operate on a certain edge device, we have to estimate the minimum memory occupation amount required by the intelligent reasoning process of the neural network. Further, this estimation method is also a real method of reasoning on the edge computing device, and thus needs to be performed quickly.

Disclosure of Invention

The invention aims to efficiently calculate the minimum memory occupation amount in the neural network model reasoning process, further provide information support for the operation and design of the neural network model on the edge equipment, and provide an estimation method and device for the minimum memory occupation amount required by the neural network model reasoning.

The aim of the invention is realized by the following technical scheme: an estimation method for the minimum memory occupation amount required by neural network model reasoning comprises the following steps:

(1) Building directed acyclic graphs from graphs of neural network models

；

(2) Directed acyclic graph

Expansion into directed acyclic graph->

Standard form of>

；

(3) Based on the standard form obtained in step (2)

Obtaining an initial pruning criterion by greedy strategy>

；

(4) Pruning acceleration is carried out by an estimation method, in particular setting a starting point

Performing a sequence search tree according to an operator

Is searched by branches of (1), the memory occupation amount of the search result is +.>

Through memory occupation amount->

And the initial pruning criterion->

To determine whether to discard the memory footprint +.>

Or the memory occupation amount is->

Updating to the minimum memory occupation amount.

Further, the directed acyclic graph in step (1)

Comprises dot set->

Sum of edges->

I.e. +.>

Wherein->

；

Each element in the set is a node and represents an operator; each node has an attribute value +>

Representing the memory occupation amount required by the calculation of the node; edge set->

Each element represents->

One side of (C)>

Representation->

Node computing usage->

Calculation result of node,/->

Source node called the edge, +.>

Target node called the edge, +.>

The value of the edge is +.>

Representing the memory occupation amount of the output result of the source node.

Further, the expansion in the step (2) is point expansion and edge expansion.

Further, the point is expanded by

Add a starting point +.>

And termination point->

Attribute values of these two points +.>

Are all 0, and the directed acyclic graph is obtained +.>

Standard form of>

Is->

。

Further, the edges are expanded to pass through the connection

And->

Start node and->

And->

Adding new edge, setting attribute value of edge +.>

Is 0, a directed acyclic graph is obtained +.>

Standard form of>

Edge set of->

Then

。

Further, the greedy strategy in the step (3) is that operators with the smallest memory occupation change quantity in the current state are added one by one from the empty sequence until all operators are executed; obtain an operator execution sequence meeting topological ordering

And the memory occupation amount of the execution sequence is obtained, the initial pruning standard +.>

The memory occupation amount of the operator execution sequence meeting the topological ordering is obtained through a greedy strategy.

Further, the step (4) is implemented by the following substeps:

(4.1) construction of operator execution sequence search Tree

: by>

Performing depth-first search algorithm of the graph, traversing all operators, and backtracking after traversing, namely searching all topological sequences, wherein the execution process of the whole traversal forms an operator sequence search tree ∈D->

；

(4.2) pruning, searching the tree in performing the sequence according to the operator

If the memory occupation of the current path is +.>

Stopping traversing downwards and backtracking;

(4.3) updating the result and pruning value, if the memory occupies a large amount after one path traverses all operators

Then->

Is an operator sequence on the path.

Further, the method comprises the steps of,the directed acyclic graph

Is to save the designed deep learning model file into tflite, pb or onnx format, i.e. corresponding to the map description of the neural network model, by using the deep learning framework tensorlow or pytorch->

。

The device for estimating the minimum memory occupation amount required by the neural network model reasoning comprises one or more processors, and is used for realizing the method for estimating the minimum memory occupation amount required by the neural network model reasoning.

A computer readable storage medium having stored thereon a program which, when executed by a processor, is adapted to carry out a method of estimating a minimum memory footprint required for neural network model reasoning as described above.

The beneficial effects of the invention are as follows: the invention designs an estimation method of minimum memory occupation amount required by neural network model reasoning by using the characteristics of a neural network graph model in consideration of urgent requirements of specific edge equipment in running the neural network model. The method can efficiently estimate the minimum total memory occupation amount required in the specific neural network model reasoning process, and correspondingly provides a corresponding reasoning process, thereby having a certain significance for the intelligent development of edge equipment.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a directed acyclic graph

Expansion to standard form->

；

FIG. 2 shows the calculation states of the operators according to their calculation orders

The next state may become +.>

Or->

；

FIG. 3 is a directed acyclic graph with specific values

；

FIG. 4 is a directed acyclic graph with specific values

Expansion to standard form->

；

Fig. 5 is a hardware configuration diagram of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.

Example 1:

consider a process for implementing the minimum memory footprint required for image processing neural network reasoning, the graph of which describes a directed acyclic graph

Comprises dot set->

Sum of edges->

I.e. +.>

As described in the following figures. The calculation nodes of the graph are composed of 1x1 and 3x3 convolution operators, and the edge weight of the graph is used for describing the tensor size, because the weight memory occupation amount of the calculation nodes is too small compared with the input/output tensor size, and can be ignored.

The invention relates to a method for estimating the minimum memory occupation amount required by neural network model reasoning, which comprises the following steps:

(1) Building directed acyclic graphs from graphs of neural network models

；

Storing a designed deep learning model file in a format of tflite, pb, onnx or the like, i.e., a graph description corresponding to the neural network model, by using a mainstream deep learning framework such as tensorlow, pytorch or the like

As shown in fig. 1.

(2) Directed acyclic graph

Expansion into directed acyclic graph->

Standard form of>

In order to provide a standard calculation starting point and end point for various neural network models, the implementation of an algorithm is convenient; the method specifically comprises the following substeps:

(2.1) directed acyclic graph

Point expansion is performed by +.>

Add a starting point +.>

And termination point->

Attribute values of these two points +.>

Are all 0, the operator is described as no operation, get +.>

；

(2.2) directed acyclic graphs

Edge expansion is performed by the newly added ++in the connecting step (2.1)>

And->

Start node and->

And->

Adding new edge, setting attribute value of edge +.>

0, get->

。

According to the method, the directed acyclic graph is obtained

Standard form of>

As shown in fig. 1.

Obviously, different node calculation orders can lead to different total demands of the neural network model on the memory in the reasoning process. And defining the minimum memory occupation amount in the neural network model reasoning process as the maximum value of the memory occupation amounts in various reasoning processes. As shown in fig. 2, the data is stored in the memory in black, and in the following

In the state, the data required by the calculation of the

nodes

8, 9, 10 and the calculation result data of the nodes, namely the memory occupation amount in the moment, need to be stored in the current memory>

。/>

The state indicates from->

The calculation state after calculation by the selection node 11 in the calculation state,/->

Representing from->

The calculation state is calculated by selecting the node 12 in the calculation state. Because of->

Has already utilized +.>

And->

The two data can be removed from the memory, thus +.>

Memory occupation in state->

Obviously, from->

To->

The memory occupation change amount of (2) is +.>

. Similarly->

The memory occupation amount under the state is

Memory footprint changeThe amount is->

. It is clear that the memory usage required for the two calculations is different. Therefore, the method finds an optimal operator execution sequence, so that the memory occupation amount in the neural network model reasoning process is minimum.

(3) Based on the standard form obtained in step (2)

Obtaining an initial pruning criterion by greedy strategy>

；

Obtaining initial pruning criteria based on greedy strategy in step (3)

Specifically, operators with the smallest memory occupation variation under the current state are added one by one from the empty sequence until all operators are executed; an operator execution sequence satisfying the topological order is obtained +.>

We are based on standard forms

The process of this greedy strategy can be derived:

as shown in fig. 3 and 4, first, an initial operator sequence = { }, an executable operator includes {0}, an operator with the smallest memory change amount is 0, and a change amount is 7x7x32=1568kb, so that the addition operator 0 enters an operator execution sequence. And (3) carrying out greedy on the next step, wherein the operator sequence is = {0}, the executable operator comprises {1}, the memory occupation amount is 1568KB+7x7x64=4704 KB, and the memory occupation amount after the operator is executed is 3136KB. Proceeding to the next greedy, the operator sequence is {0,1}, the executable operators include {2,3}, and the memory footprint of operator 2 is 3136+7x7x32=4704 KB. The memory footprint of operator 3 is 3136+4x4x32=3648 KB. Because the memory occupation amount of the operator 3 is smaller than that of the operator 2 (3648 < 4704), the operator 3 is selected to be added into the operator execution sequence. Until all operators are added to the operator execution sequence.

The final obtained result: operator sequence S ₀ = {0,1,3,5,2,4,6,7, end }. And the memory occupation amount of the execution sequence is m=4960kb (input tensor of 2 nodes+output tensor of 5 nodes=4x4x16+7x7x64+7x7x32).

Performing a sequence search tree according to an operator

By and to pruning standard->

The comparison is made to determine whether to discard or update to the minimum memory footprint. The method specifically comprises the following substeps:

(4.1) construction of operator execution sequence search Tree

: by>

Performing depth-first search (DFS) algorithm of the graph, traversing all operators, and backtracking after traversing, namely searching all topological sequences, wherein the execution process of the whole traversal forms an operator execution sequence search tree->

。

If the memory occupation of the current path is +.>

And stopping traversing downwards and backtracking.

Then->

，/>

Permutation is performed for operators on the path.

The final result of this embodiment is the same as the result of the greedy strategy, the operator execution sequence is

The minimum memory occupation is +.>

. If we do not use the invented method, the memory footprint would reach 5216KB when running in order (i.e., the operator execution sequences are {0,1,2,3,4,5,6,7, end }).

Corresponding to the embodiment of the method for estimating the minimum memory occupation amount required by the neural network model reasoning, the invention also provides an embodiment of the device for estimating the minimum memory occupation amount required by the neural network model reasoning.

Referring to fig. 5, an apparatus for estimating a minimum memory occupation amount required for neural network model reasoning according to an embodiment of the present invention includes one or more processors configured to implement a method for estimating a minimum memory occupation amount required for neural network model reasoning in the foregoing embodiment.

The embodiment of the invention of the estimating device for the minimum memory occupation amount required by the neural network model reasoning can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. From the hardware level, as shown in fig. 5, a hardware structure diagram of an apparatus with any data processing capability where the estimating device for the minimum memory occupation amount required for the neural network model reasoning of the present invention is located is shown, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any apparatus with any data processing capability in the embodiment generally includes other hardware according to the actual function of the apparatus with any data processing capability, which is not described herein.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements a method for estimating a minimum memory occupation amount required for neural network model reasoning in the above embodiment.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The specification and examples are to be regarded in an illustrative manner only.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. The estimating method for the minimum memory occupation amount required by the neural network model reasoning is characterized by comprising the following steps:

(1) Constructing a directed acyclic graph G through a graph of the neural network model;

(2) Expanding the directed acyclic graph G into a standard form G' of the directed acyclic graph G;

the expansion in the step (2) is point expansion and edge expansion;

the point is extended by adding a starting point V to V ₀ And termination point v _end The attribute value VW of the two points is 0, and a point set V 'of a standard form G' of the directed acyclic graph G is obtained;

the edge is expanded by connection v ₀ And V and the start node in V _end Adding a new edge with a termination node in V, setting the attribute value EW of the edge to be 0, and obtaining an edge set E ' of a standard form G ' of the directed acyclic graph G, wherein G ' = (V ', E ');

(3) Obtaining an initial pruning standard M through a greedy strategy based on the standard form G' obtained in the step (2);

(4) Pruning acceleration is carried out by an estimation method, specifically setting a starting point v ₀ =0, searching according to branches of the operator execution sequence search tree T, wherein the memory occupation amount of the search result is M ', and judging whether to discard the memory occupation amount M' or update the memory occupation amount M 'to be the minimum memory occupation amount by comparing the memory occupation amount M' with the initial pruning standard M; said step (4) is realized by the sub-steps of:

(4.1) constructing an operator execution sequence search tree T: traversing all operators through a depth-first search algorithm of the graph of the standard form G', and backtracking after traversing, namely searching all topological sequences, wherein the whole traversing execution process forms an operator sequence search tree T;

(4.2) pruning: in the traversing process of the sequence search tree T according to the operator execution sequence, if the memory occupation amount M' of the current path is not less than M, stopping traversing downwards, and backtracking;

(4.3) updating the result and pruning value: after one path traverses all operators, if the memory occupation amount M '< M, M=M', s=s ', s' is the operator sequence on the path.

2. The method of claim 1, wherein the directed acyclic graph G in step (1) includes a point set V and an edge set E, i.e., g= (V, E), wherein,

V＝{v ₁ ，v ₂ ，...，v _n }

each element in the V set is a node and represents an operator; each node has an attribute value VW which represents the memory occupation amount required by the calculation of the node; each element of edge set E represents an edge in G, (v) _i ，v _k ) Representing v _k Node computation usage v _i Calculation result of node v _i Source node called the edge, v _k A target node called the edge, (v) _i ，v _k ) The value of the edge is EW _i，k Representing the memory occupation amount of the output result of the source node.

3. The method for estimating the minimum memory occupation amount required by neural network model reasoning according to claim 1, wherein the greedy strategy in the step (3) is that operators with the minimum memory occupation variation amount in the current state are added one by one from a null sequence until all operators are executed; obtaining an operator execution sequence s meeting topological sorting, obtaining the memory occupation amount of the execution sequence, and obtaining the initial pruning standard M, namely the memory occupation amount of the operator execution sequence meeting topological sorting through a greedy strategy.

4. The method for estimating a minimum memory occupation amount required for neural network model reasoning according to claim 1, wherein the directed acyclic graph G is a graph description G corresponding to the neural network model by storing a designed deep learning model file in tflite, pb or onnx format using a deep learning framework tensorflow or pytorch.

5. An apparatus for estimating a minimum memory footprint required for neural network model reasoning, comprising one or more processors configured to implement a method for estimating a minimum memory footprint required for neural network model reasoning as claimed in any of claims 1-4.

6. A computer readable storage medium having stored thereon a program which, when executed by a processor, is adapted to carry out a method of estimating a minimum memory footprint required for neural network model reasoning as claimed in any of claims 1 to 4.