CN115809699B - Method and device for estimating minimum memory occupation amount required by neural network model reasoning - Google Patents

Method and device for estimating minimum memory occupation amount required by neural network model reasoning Download PDF

Info

Publication number
CN115809699B
CN115809699B CN202310052812.XA CN202310052812A CN115809699B CN 115809699 B CN115809699 B CN 115809699B CN 202310052812 A CN202310052812 A CN 202310052812A CN 115809699 B CN115809699 B CN 115809699B
Authority
CN
China
Prior art keywords
memory occupation
neural network
occupation amount
network model
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310052812.XA
Other languages
Chinese (zh)
Other versions
CN115809699A (en
Inventor
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310052812.XA priority Critical patent/CN115809699B/en
Publication of CN115809699A publication Critical patent/CN115809699A/en
Application granted granted Critical
Publication of CN115809699B publication Critical patent/CN115809699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for estimating minimum memory occupation amount required by neural network model reasoning, and belongs to the field of neural network application. The method uses graph theory as a core idea, the calculation logic of the neural network is described by a directed acyclic graph, the reasoning process of the neural network model is mapped into the topological ordering process of the graph, and then the search tree is pruned to obtain the estimation of the minimum memory occupation amount and the corresponding operator execution sequence. The invention provides information support for the operation and design of the neural network model on the edge equipment, and contributes to the intelligence of the edge equipment.

Description

Method and device for estimating minimum memory occupation amount required by neural network model reasoning
Technical Field
The invention belongs to the technical field of neural network application, and particularly relates to a method and a device for estimating minimum memory occupation amount required by neural network model reasoning.
Background
In recent years, the rapid development of the neural network field attracts a great deal of attention, and related application results are endless. For example, face recognition techniques may be applied to everyday punching cards, and image recognition and semantic segmentation techniques may be applied to personnel security monitoring. These techniques play a vital role in our lives, however, their use in perfectly landed applications still faces a series of challenges.
Firstly, in the application of the neural network which is mature at present, the data to be inferred are all kinds of sensing data collected by a camera and sensing equipment, the data are transmitted to a remote server through a network path, intelligent inference is carried out on the data on the server, an inference result is returned to an edge equipment end through a network, and the edge equipment carries out further processing according to the result. The disadvantage of this approach is that the overall reasoning process is time consuming and is highly susceptible to network stability. To reduce inference time, the neural network model may be run directly on the edge computing device. However, edge computing devices differ significantly from servers, taking the industry using a widely used STM32F7 microcontroller as an example, with a maximum on-chip RAM memory of 512KB, which means that the minimum total memory footprint required by the neural network model in the reasoning process on this type of microcontroller must not exceed 512KB.
Different edge devices have different memory limitations, different neural network model structures are different, and in order to judge whether the neural network can operate on a certain edge device, we have to estimate the minimum memory occupation amount required by the intelligent reasoning process of the neural network. Further, this estimation method is also a real method of reasoning on the edge computing device, and thus needs to be performed quickly.
Disclosure of Invention
The invention aims to efficiently calculate the minimum memory occupation amount in the neural network model reasoning process, further provide information support for the operation and design of the neural network model on the edge equipment, and provide an estimation method and device for the minimum memory occupation amount required by the neural network model reasoning.
The aim of the invention is realized by the following technical scheme: an estimation method for the minimum memory occupation amount required by neural network model reasoning comprises the following steps:
(1) Building directed acyclic graphs from graphs of neural network models
Figure SMS_1
(2) Directed acyclic graph
Figure SMS_2
Expansion into directed acyclic graph->
Figure SMS_3
Standard form of>
Figure SMS_4
(3) Based on the standard form obtained in step (2)
Figure SMS_5
Obtaining an initial pruning criterion by greedy strategy>
Figure SMS_6
(4) Pruning acceleration is carried out by an estimation method, in particular setting a starting point
Figure SMS_7
Performing a sequence search tree according to an operator
Figure SMS_8
Is searched by branches of (1), the memory occupation amount of the search result is +.>
Figure SMS_9
Through memory occupation amount->
Figure SMS_10
And the initial pruning criterion->
Figure SMS_11
To determine whether to discard the memory footprint +.>
Figure SMS_12
Or the memory occupation amount is->
Figure SMS_13
Updating to the minimum memory occupation amount.
Further, the directed acyclic graph in step (1)
Figure SMS_14
Comprises dot set->
Figure SMS_15
Sum of edges->
Figure SMS_16
I.e. +.>
Figure SMS_17
Wherein->
Figure SMS_18
Figure SMS_20
Each element in the set is a node and represents an operator; each node has an attribute value +>
Figure SMS_23
Representing the memory occupation amount required by the calculation of the node; edge set->
Figure SMS_26
Each element represents->
Figure SMS_21
One side of (C)>
Figure SMS_24
Representation->
Figure SMS_27
Node computing usage->
Figure SMS_29
Calculation result of node,/->
Figure SMS_19
Source node called the edge, +.>
Figure SMS_22
Target node called the edge, +.>
Figure SMS_25
The value of the edge is +.>
Figure SMS_28
Representing the memory occupation amount of the output result of the source node.
Further, the expansion in the step (2) is point expansion and edge expansion.
Further, the point is expanded by
Figure SMS_30
Add a starting point +.>
Figure SMS_31
And termination point->
Figure SMS_32
Attribute values of these two points +.>
Figure SMS_33
Are all 0, and the directed acyclic graph is obtained +.>
Figure SMS_34
Standard form of>
Figure SMS_35
Is->
Figure SMS_36
Further, the edges are expanded to pass through the connection
Figure SMS_38
And->
Figure SMS_41
Start node and->
Figure SMS_43
And->
Figure SMS_39
Adding new edge, setting attribute value of edge +.>
Figure SMS_42
Is 0, a directed acyclic graph is obtained +.>
Figure SMS_44
Standard form of>
Figure SMS_45
Edge set of->
Figure SMS_37
Then
Figure SMS_40
Further, the greedy strategy in the step (3) is that operators with the smallest memory occupation change quantity in the current state are added one by one from the empty sequence until all operators are executed; obtain an operator execution sequence meeting topological ordering
Figure SMS_46
And the memory occupation amount of the execution sequence is obtained, the initial pruning standard +.>
Figure SMS_47
The memory occupation amount of the operator execution sequence meeting the topological ordering is obtained through a greedy strategy.
Further, the step (4) is implemented by the following substeps:
(4.1) construction of operator execution sequence search Tree
Figure SMS_48
: by>
Figure SMS_49
Performing depth-first search algorithm of the graph, traversing all operators, and backtracking after traversing, namely searching all topological sequences, wherein the execution process of the whole traversal forms an operator sequence search tree ∈D->
Figure SMS_50
(4.2) pruning, searching the tree in performing the sequence according to the operator
Figure SMS_51
If the memory occupation of the current path is +.>
Figure SMS_52
Stopping traversing downwards and backtracking;
(4.3) updating the result and pruning value, if the memory occupies a large amount after one path traverses all operators
Figure SMS_53
Then->
Figure SMS_54
Is an operator sequence on the path.
Further, the method comprises the steps of,the directed acyclic graph
Figure SMS_55
Is to save the designed deep learning model file into tflite, pb or onnx format, i.e. corresponding to the map description of the neural network model, by using the deep learning framework tensorlow or pytorch->
Figure SMS_56
The device for estimating the minimum memory occupation amount required by the neural network model reasoning comprises one or more processors, and is used for realizing the method for estimating the minimum memory occupation amount required by the neural network model reasoning.
A computer readable storage medium having stored thereon a program which, when executed by a processor, is adapted to carry out a method of estimating a minimum memory footprint required for neural network model reasoning as described above.
The beneficial effects of the invention are as follows: the invention designs an estimation method of minimum memory occupation amount required by neural network model reasoning by using the characteristics of a neural network graph model in consideration of urgent requirements of specific edge equipment in running the neural network model. The method can efficiently estimate the minimum total memory occupation amount required in the specific neural network model reasoning process, and correspondingly provides a corresponding reasoning process, thereby having a certain significance for the intelligent development of edge equipment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a directed acyclic graph
Figure SMS_57
Expansion to standard form->
Figure SMS_58
FIG. 2 shows the calculation states of the operators according to their calculation orders
Figure SMS_59
The next state may become +.>
Figure SMS_60
Or->
Figure SMS_61
FIG. 3 is a directed acyclic graph with specific values
Figure SMS_62
FIG. 4 is a directed acyclic graph with specific values
Figure SMS_63
Expansion to standard form->
Figure SMS_64
Fig. 5 is a hardware configuration diagram of the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.
Example 1:
consider a process for implementing the minimum memory footprint required for image processing neural network reasoning, the graph of which describes a directed acyclic graph
Figure SMS_65
Comprises dot set->
Figure SMS_66
Sum of edges->
Figure SMS_67
I.e. +.>
Figure SMS_68
As described in the following figures. The calculation nodes of the graph are composed of 1x1 and 3x3 convolution operators, and the edge weight of the graph is used for describing the tensor size, because the weight memory occupation amount of the calculation nodes is too small compared with the input/output tensor size, and can be ignored.
The invention relates to a method for estimating the minimum memory occupation amount required by neural network model reasoning, which comprises the following steps:
(1) Building directed acyclic graphs from graphs of neural network models
Figure SMS_69
Storing a designed deep learning model file in a format of tflite, pb, onnx or the like, i.e., a graph description corresponding to the neural network model, by using a mainstream deep learning framework such as tensorlow, pytorch or the like
Figure SMS_70
As shown in fig. 1.
(2) Directed acyclic graph
Figure SMS_71
Expansion into directed acyclic graph->
Figure SMS_72
Standard form of>
Figure SMS_73
In order to provide a standard calculation starting point and end point for various neural network models, the implementation of an algorithm is convenient; the method specifically comprises the following substeps:
(2.1) directed acyclic graph
Figure SMS_74
Point expansion is performed by +.>
Figure SMS_75
Add a starting point +.>
Figure SMS_76
And termination point->
Figure SMS_77
Attribute values of these two points +.>
Figure SMS_78
Are all 0, the operator is described as no operation, get +.>
Figure SMS_79
(2.2) directed acyclic graphs
Figure SMS_80
Edge expansion is performed by the newly added ++in the connecting step (2.1)>
Figure SMS_81
And->
Figure SMS_82
Start node and->
Figure SMS_83
And->
Figure SMS_84
Adding new edge, setting attribute value of edge +.>
Figure SMS_85
0, get->
Figure SMS_86
According to the method, the directed acyclic graph is obtained
Figure SMS_87
Standard form of>
Figure SMS_88
As shown in fig. 1.
Obviously, different node calculation orders can lead to different total demands of the neural network model on the memory in the reasoning process. And defining the minimum memory occupation amount in the neural network model reasoning process as the maximum value of the memory occupation amounts in various reasoning processes. As shown in fig. 2, the data is stored in the memory in black, and in the following
Figure SMS_91
In the state, the data required by the calculation of the nodes 8, 9, 10 and the calculation result data of the nodes, namely the memory occupation amount in the moment, need to be stored in the current memory>
Figure SMS_90
。/>
Figure SMS_98
The state indicates from->
Figure SMS_93
The calculation state after calculation by the selection node 11 in the calculation state,/->
Figure SMS_100
Representing from->
Figure SMS_92
The calculation state is calculated by selecting the node 12 in the calculation state. Because of->
Figure SMS_103
Has already utilized +.>
Figure SMS_96
And->
Figure SMS_101
The two data can be removed from the memory, thus +.>
Figure SMS_89
Memory occupation in state->
Figure SMS_99
Obviously, from->
Figure SMS_94
To->
Figure SMS_104
The memory occupation change amount of (2) is +.>
Figure SMS_97
. Similarly->
Figure SMS_105
The memory occupation amount under the state is
Figure SMS_95
Memory footprint changeThe amount is->
Figure SMS_102
. It is clear that the memory usage required for the two calculations is different. Therefore, the method finds an optimal operator execution sequence, so that the memory occupation amount in the neural network model reasoning process is minimum.
(3) Based on the standard form obtained in step (2)
Figure SMS_106
Obtaining an initial pruning criterion by greedy strategy>
Figure SMS_107
Obtaining initial pruning criteria based on greedy strategy in step (3)
Figure SMS_108
Specifically, operators with the smallest memory occupation variation under the current state are added one by one from the empty sequence until all operators are executed; an operator execution sequence satisfying the topological order is obtained +.>
Figure SMS_109
And the memory occupation amount of the execution sequence is obtained, the initial pruning standard +.>
Figure SMS_110
The memory occupation amount of the operator execution sequence meeting the topological ordering is obtained through a greedy strategy.
We are based on standard forms
Figure SMS_111
The process of this greedy strategy can be derived:
as shown in fig. 3 and 4, first, an initial operator sequence = { }, an executable operator includes {0}, an operator with the smallest memory change amount is 0, and a change amount is 7x7x32=1568kb, so that the addition operator 0 enters an operator execution sequence. And (3) carrying out greedy on the next step, wherein the operator sequence is = {0}, the executable operator comprises {1}, the memory occupation amount is 1568KB+7x7x64=4704 KB, and the memory occupation amount after the operator is executed is 3136KB. Proceeding to the next greedy, the operator sequence is {0,1}, the executable operators include {2,3}, and the memory footprint of operator 2 is 3136+7x7x32=4704 KB. The memory footprint of operator 3 is 3136+4x4x32=3648 KB. Because the memory occupation amount of the operator 3 is smaller than that of the operator 2 (3648 < 4704), the operator 3 is selected to be added into the operator execution sequence. Until all operators are added to the operator execution sequence.
The final obtained result: operator sequence S 0 = {0,1,3,5,2,4,6,7, end }. And the memory occupation amount of the execution sequence is m=4960kb (input tensor of 2 nodes+output tensor of 5 nodes=4x4x16+7x7x64+7x7x32).
(4) Pruning acceleration is carried out by an estimation method, in particular setting a starting point
Figure SMS_112
Performing a sequence search tree according to an operator
Figure SMS_113
Is searched by branches of (1), the memory occupation amount of the search result is +.>
Figure SMS_114
By and to pruning standard->
Figure SMS_115
The comparison is made to determine whether to discard or update to the minimum memory footprint. The method specifically comprises the following substeps:
(4.1) construction of operator execution sequence search Tree
Figure SMS_116
: by>
Figure SMS_117
Performing depth-first search (DFS) algorithm of the graph, traversing all operators, and backtracking after traversing, namely searching all topological sequences, wherein the execution process of the whole traversal forms an operator execution sequence search tree->
Figure SMS_118
(4.2) pruning, searching the tree in performing the sequence according to the operator
Figure SMS_119
If the memory occupation of the current path is +.>
Figure SMS_120
And stopping traversing downwards and backtracking.
(4.3) updating the result and pruning value, if the memory occupies a large amount after one path traverses all operators
Figure SMS_121
Then->
Figure SMS_122
,/>
Figure SMS_123
Permutation is performed for operators on the path.
The final result of this embodiment is the same as the result of the greedy strategy, the operator execution sequence is
Figure SMS_124
The minimum memory occupation is +.>
Figure SMS_125
. If we do not use the invented method, the memory footprint would reach 5216KB when running in order (i.e., the operator execution sequences are {0,1,2,3,4,5,6,7, end }).
Corresponding to the embodiment of the method for estimating the minimum memory occupation amount required by the neural network model reasoning, the invention also provides an embodiment of the device for estimating the minimum memory occupation amount required by the neural network model reasoning.
Referring to fig. 5, an apparatus for estimating a minimum memory occupation amount required for neural network model reasoning according to an embodiment of the present invention includes one or more processors configured to implement a method for estimating a minimum memory occupation amount required for neural network model reasoning in the foregoing embodiment.
The embodiment of the invention of the estimating device for the minimum memory occupation amount required by the neural network model reasoning can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. From the hardware level, as shown in fig. 5, a hardware structure diagram of an apparatus with any data processing capability where the estimating device for the minimum memory occupation amount required for the neural network model reasoning of the present invention is located is shown, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any apparatus with any data processing capability in the embodiment generally includes other hardware according to the actual function of the apparatus with any data processing capability, which is not described herein.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements a method for estimating a minimum memory occupation amount required for neural network model reasoning in the above embodiment.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The specification and examples are to be regarded in an illustrative manner only.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims (6)

1. The estimating method for the minimum memory occupation amount required by the neural network model reasoning is characterized by comprising the following steps:
(1) Constructing a directed acyclic graph G through a graph of the neural network model;
(2) Expanding the directed acyclic graph G into a standard form G' of the directed acyclic graph G;
the expansion in the step (2) is point expansion and edge expansion;
the point is extended by adding a starting point V to V 0 And termination point v end The attribute value VW of the two points is 0, and a point set V 'of a standard form G' of the directed acyclic graph G is obtained;
the edge is expanded by connection v 0 And V and the start node in V end Adding a new edge with a termination node in V, setting the attribute value EW of the edge to be 0, and obtaining an edge set E ' of a standard form G ' of the directed acyclic graph G, wherein G ' = (V ', E ');
(3) Obtaining an initial pruning standard M through a greedy strategy based on the standard form G' obtained in the step (2);
(4) Pruning acceleration is carried out by an estimation method, specifically setting a starting point v 0 =0, searching according to branches of the operator execution sequence search tree T, wherein the memory occupation amount of the search result is M ', and judging whether to discard the memory occupation amount M' or update the memory occupation amount M 'to be the minimum memory occupation amount by comparing the memory occupation amount M' with the initial pruning standard M; said step (4) is realized by the sub-steps of:
(4.1) constructing an operator execution sequence search tree T: traversing all operators through a depth-first search algorithm of the graph of the standard form G', and backtracking after traversing, namely searching all topological sequences, wherein the whole traversing execution process forms an operator sequence search tree T;
(4.2) pruning: in the traversing process of the sequence search tree T according to the operator execution sequence, if the memory occupation amount M' of the current path is not less than M, stopping traversing downwards, and backtracking;
(4.3) updating the result and pruning value: after one path traverses all operators, if the memory occupation amount M '< M, M=M', s=s ', s' is the operator sequence on the path.
2. The method of claim 1, wherein the directed acyclic graph G in step (1) includes a point set V and an edge set E, i.e., g= (V, E), wherein,
V={v 1 ,v 2 ,...,v n }
each element in the V set is a node and represents an operator; each node has an attribute value VW which represents the memory occupation amount required by the calculation of the node; each element of edge set E represents an edge in G, (v) i ,v k ) Representing v k Node computation usage v i Calculation result of node v i Source node called the edge, v k A target node called the edge, (v) i ,v k ) The value of the edge is EW i,k Representing the memory occupation amount of the output result of the source node.
3. The method for estimating the minimum memory occupation amount required by neural network model reasoning according to claim 1, wherein the greedy strategy in the step (3) is that operators with the minimum memory occupation variation amount in the current state are added one by one from a null sequence until all operators are executed; obtaining an operator execution sequence s meeting topological sorting, obtaining the memory occupation amount of the execution sequence, and obtaining the initial pruning standard M, namely the memory occupation amount of the operator execution sequence meeting topological sorting through a greedy strategy.
4. The method for estimating a minimum memory occupation amount required for neural network model reasoning according to claim 1, wherein the directed acyclic graph G is a graph description G corresponding to the neural network model by storing a designed deep learning model file in tflite, pb or onnx format using a deep learning framework tensorflow or pytorch.
5. An apparatus for estimating a minimum memory footprint required for neural network model reasoning, comprising one or more processors configured to implement a method for estimating a minimum memory footprint required for neural network model reasoning as claimed in any of claims 1-4.
6. A computer readable storage medium having stored thereon a program which, when executed by a processor, is adapted to carry out a method of estimating a minimum memory footprint required for neural network model reasoning as claimed in any of claims 1 to 4.
CN202310052812.XA 2023-02-03 2023-02-03 Method and device for estimating minimum memory occupation amount required by neural network model reasoning Active CN115809699B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310052812.XA CN115809699B (en) 2023-02-03 2023-02-03 Method and device for estimating minimum memory occupation amount required by neural network model reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310052812.XA CN115809699B (en) 2023-02-03 2023-02-03 Method and device for estimating minimum memory occupation amount required by neural network model reasoning

Publications (2)

Publication Number Publication Date
CN115809699A CN115809699A (en) 2023-03-17
CN115809699B true CN115809699B (en) 2023-06-23

Family

ID=85487353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310052812.XA Active CN115809699B (en) 2023-02-03 2023-02-03 Method and device for estimating minimum memory occupation amount required by neural network model reasoning

Country Status (1)

Country Link
CN (1) CN115809699B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523052B (en) * 2023-07-05 2023-08-29 成都阿加犀智能科技有限公司 Rapid reasoning method, device and equipment
CN117009093B (en) * 2023-10-07 2024-03-12 之江实验室 Recalculation method and system for reducing memory occupation amount required by neural network reasoning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070223A (en) * 2020-08-17 2020-12-11 电子科技大学 Model parallel method based on Tensorflow framework
CN112084037A (en) * 2020-09-23 2020-12-15 安徽寒武纪信息科技有限公司 Memory allocation method and device of neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095474A (en) * 2020-01-09 2021-07-09 微软技术许可有限责任公司 Resource usage prediction for deep learning models
CN112085172B (en) * 2020-09-16 2022-09-16 支付宝(杭州)信息技术有限公司 Method and device for training graph neural network
US20240160891A1 (en) * 2021-03-26 2024-05-16 Allwinner Technology Co., Ltd. Memory allocation method for ai processor, computer apparatus, and computer-readable storage medium
CN113326869A (en) * 2021-05-08 2021-08-31 清华大学 Deep learning calculation graph optimization method based on longest path fusion algorithm
CN115018064A (en) * 2022-06-27 2022-09-06 中国科学技术大学 Space distribution method and device for computing nodes
CN115186821B (en) * 2022-09-13 2023-01-06 之江实验室 Core particle-oriented neural network inference overhead estimation method and device and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070223A (en) * 2020-08-17 2020-12-11 电子科技大学 Model parallel method based on Tensorflow framework
CN112084037A (en) * 2020-09-23 2020-12-15 安徽寒武纪信息科技有限公司 Memory allocation method and device of neural network

Also Published As

Publication number Publication date
CN115809699A (en) 2023-03-17

Similar Documents

Publication Publication Date Title
CN115809699B (en) Method and device for estimating minimum memory occupation amount required by neural network model reasoning
CN108287864B (en) Interest group dividing method, device, medium and computing equipment
US8953888B2 (en) Detecting and localizing multiple objects in images using probabilistic inference
CN112069398A (en) Information pushing method and device based on graph network
Fan et al. Querying big graphs within bounded resources
CN111723292B (en) Recommendation method, system, electronic equipment and storage medium based on graph neural network
CN109978060B (en) Training method and device of natural language element extraction model
US11645548B1 (en) Automated cloud data and technology solution delivery using machine learning and artificial intelligence modeling
TW202203212A (en) Key point detection method, electronic device and computer readable storage medium
CN111950596A (en) Training method for neural network and related equipment
CN111460234B (en) Graph query method, device, electronic equipment and computer readable storage medium
CN112463952B (en) News text aggregation method and system based on neighbor search
CN113590863A (en) Image clustering method and device and computer readable storage medium
CN111444956A (en) Low-load information prediction method and device, computer system and readable storage medium
CN111626311B (en) Heterogeneous graph data processing method and device
Yan et al. A clustering algorithm for multi-modal heterogeneous big data with abnormal data
CN109828965B (en) Data processing method and electronic equipment
CN113839799A (en) Alarm association rule mining method and device
CN114900435B (en) Connection relation prediction method and related equipment
CN116802646A (en) Data processing method and device
CN112906824A (en) Vehicle clustering method, system, device and storage medium
CN114373090A (en) Model lightweight method, device, electronic equipment and computer readable storage medium
CN115358379B (en) Neural network processing method, neural network processing device, information processing method, information processing device and computer equipment
US11829735B2 (en) Artificial intelligence (AI) framework to identify object-relational mapping issues in real-time
Zhang et al. Adaptive graph convolutional recurrent neural networks for system-level mobile traffic forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant