CN112256441A

CN112256441A - Memory allocation method and device for neural network inference

Info

Publication number: CN112256441A
Application number: CN202011535579.3A
Authority: CN
Inventors: 梁军
Original assignee: Shanghai Qigan Electronic Information Technology Co ltd
Current assignee: Shanghai Qigan Electronic Information Technology Co ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-01-22
Anticipated expiration: 2040-12-23
Also published as: CN112256441B

Abstract

The memory allocation method and device for neural network inference, which starts to allocate memory from the last output layer of neural network inference until the memory allocation to all input layers is completed, includes: allocating memory space for all final output layers in the neural network estimation at one side of the initial position; obtaining an IFM layer of a currently allocated memory space in the neural network estimation; allocating memory space for each IFM layer of the currently allocated memory space; in this process, after all IFM layers of a layer of a currently allocated memory space have been allocated with space, the memory space of the layer of the currently allocated memory space is recovered, and when the memory space allocated to a certain IFM layer of a certain layer is located on a certain side of the initial position, so that the memory space of the layer of the currently allocated memory space can be recovered, the IFM layer is allocated to the certain side of the initial position. The invention reduces the memory occupation of neural network reasoning, improves the memory reuse rate and has more reasonable memory planning.

Description

Memory allocation method and device for neural network inference

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a neural network reasoning memory allocation method and device.

Background

Due to the high efficiency and accuracy of the deep neural network, the deep neural network is particularly excellent in tasks such as detection, identification and classification, and in recent years, the application of the deep neural network in life is continuously expanded and diverged. Thus, various types of embedded neural Network Processors (NPUs) have emerged.

However, the deep neural network usually occupies a large amount of memory, which increases the requirement for hardware and directly results in the increase of the production cost of the hardware. Therefore, how to reduce the memory occupation of the deep neural network is a problem which needs to be solved urgently at present, the requirement of the deep neural network on hardware can be greatly reduced, and the cost is saved.

The existing neural network reasoning assumes that the input and output of the neural network and the data of the middle layer cannot interfere with each other, does not have targeted memory allocation and optimization, and is directly handed to an operating system for doing so. In this case, the input and output of the neural network and the input and output of the intermediate layer occupy the same memory size as the tiled approach.

The memory consumed by the above approach is too large, especially for edge computing devices, in which case the neural network (e.g., vgg 19) with the larger middle tier data cannot even run on the processors of the edge computing devices.

Interpretation of related terms

Fm (feature map), feature map;

IFM (input Feature map), input the characteristic map;

OFM (output Feature map), and outputting the Feature map.

Disclosure of Invention

The technical problem solved by the invention is as follows: how to reduce the memory occupation of neural network reasoning to adapt to edge computing equipment.

In order to solve the above technical problem, an embodiment of the present invention provides a method for allocating memory for neural network inference, where in a process of allocating memory space for FMs of each layer in the neural network inference, memory is allocated from a last output layer of the neural network inference, and memory is allocated for each layer in an input layer direction until memory allocation for all input layers is completed, including:

initializing the memory management model;

obtaining all final output layers in the neural network inference, wherein the final output layers are OFMs which are not IFMs of other layers;

allocating memory space for all final output layers in the neural network estimation at one side of the initial position;

obtaining an IFM layer of a currently allocated memory space in the neural network estimation;

allocating memory space for each IFM layer of the currently allocated memory space; in the process, after all IFM layers of a layer of a currently allocated memory space have been allocated with space, recovering the memory space of the layer of the currently allocated memory space, and when the memory space allocated to a certain IFM layer of a certain layer is located at a certain side of the initial position, so that the memory space of the layer of the currently allocated memory space can be recovered, allocating the IFM layer to the certain side of the initial position;

repeating the steps of obtaining the IFM layer of the currently allocated memory space in the neural network estimation and allocating the memory space for each IFM layer of the currently allocated memory space until all layers in the neural network estimation are allocated with the memory space and until all layers in the neural network estimation are allocated with the memory space.

Optionally, the initializing the memory management model includes: setting min _ l _ boundary = -1, max _ r _ boundary =0, lstack empty and rstack empty, wherein min _ l _ boundary represents a left boundary, min _ r _ boundary represents a right boundary, lstack represents a memory space within the left boundary of the initial position, and rstack represents a memory space within the right boundary of the initial position.

Optionally, the allocating, at one side of the initial position, a memory space for all the last output layers in the neural network inference includes:

sorting all final output layers in the neural network inference according to ID;

and on the right side of the initial position, sequentially distributing memory space for each last output layer in the neural network estimation from small to large according to the ID of each last output layer.

Optionally, the allocating a memory space for each IFM layer of the layer to which the memory space is currently allocated includes:

and comparing the ID of the layer of the current last allocated memory space in the memory space on the left side of the initial position with the ID of the layer of the current last allocated memory space in the memory space on the right side of the initial position, and allocating the memory space for all IFM layers of the layer with larger ID.

Optionally, the allocating memory space to all layers in the neural network inference includes: and judging whether the memory space on the left side of the initial position and the memory space on the right side of the initial position are both recycled, if so, allocating the memory space to all layers in the neural network estimation.

Optionally, when the recoverable memory space allocated to the left side and the right side of the initial position of a layer is different in size and the life cycle of the layer is 1, the layer is allocated to the side with the larger recoverable memory space.

Optionally, when the allocated memory space on the left side of the initial position exceeds the left boundary, the left boundary is updated to the allocated memory space on the left side of the current initial position, and when the allocated memory space on the right side of the initial position exceeds the right boundary, the right boundary is updated to the allocated memory space on the right side of the current initial position.

Optionally, in the process of allocating the memory space for each IFM layer of the layer to which the memory space is currently allocated, when the expansion sizes of the left side boundary or the right side boundary caused by that a layer is allocated to the left and right sides of the initial position are different, the layer is allocated to the side on which the expansion size of the left side boundary or the right side boundary is smaller.

Alternatively, when the extension sizes of the left side boundary or the right side boundary caused by the fact that a certain layer is allocated to the left and right sides of the initial position are the same, if the FM of a certain layer is already allocated to the right side of the initial position, the IFM layer of the certain layer is allocated to the left side of the initial position, and if the FM of a certain layer is already allocated to the left side of the initial position, the IFM layer of the certain layer is allocated to the right side of the initial position.

Optionally, the method further includes: after all layers in the neural network inference are allocated with memory space, determining the memory space size occupied by the neural network inference according to the current left side boundary and the current right side boundary.

In order to solve the above technical problem, an embodiment of the present invention further provides a neural network-based memory allocation apparatus, including:

a processor adapted to load and execute instructions of a software program;

a memory adapted to store a software program comprising instructions for performing the steps of:

initializing the memory management model;

repeating the steps of obtaining the IFM layer of the currently allocated memory space in the neural network estimation and allocating the memory space for each IFM layer of the currently allocated memory space until all layers in the neural network estimation are allocated with the memory space;

in the process of allocating memory space for the FM of each layer in the neural network inference, memory is allocated from the last output layer of the neural network inference, and memory is allocated for each layer in the direction of the input layer until memory allocation for all input layers is completed.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

in the process of allocating memory space for FM of each layer in neural network reasoning, memory is allocated from the last output layer of the neural network reasoning, and memory is allocated for each layer in the direction of the input layer until memory allocation for all input layers is completed, including: obtaining all final output layers in the neural network inference; allocating memory space for all final output layers in the neural network estimation at one side of the initial position; obtaining an IFM layer of a currently allocated memory space in the neural network estimation; allocating memory space for each IFM layer of the currently allocated memory space; in the process, after all IFM layers of a layer of a currently allocated memory space have been allocated with space, recovering the memory space of the layer of the currently allocated memory space, and when the memory space allocated to a certain IFM layer of a certain layer is located at a certain side of the initial position, so that the memory space of the layer of the currently allocated memory space can be recovered, allocating the IFM layer to the certain side of the initial position; and repeating the steps until all layers in the neural network inference are allocated with memory space, thereby reducing the memory occupation of the neural network inference, improving the memory reuse rate and ensuring that the memory planning is more reasonable.

Further, inverse programming is adopted, namely, memory is allocated from the output layer of the neural network until the input layer is completed (in the prior art, the memory optimization scheme of neural network inference usually starts from the input layer to the output layer).

Further, a divergent memory model is adopted, namely divergence is carried out from the initial position to the left side and the right side, and each layer and the IFM layer are distributed on two sides which do not interfere with each other.

Further, when the allocated memory space of an IFM layer of a layer is located at a side of the initial location so that the memory space of the layer can be recycled, the IFM layer is allocated to the side of the initial location, so that in addition to optimizing the current state, optimization of a future state is considered.

Further, in the process of allocating memory space for each IFM layer of the layer to which memory space is currently allocated, the layer is allocated to a side that causes the expansion size of the left side boundary or the right side boundary to be smaller, and when the expansion sizes of the left side boundary or the right side boundary caused by allocating a layer to the left and right sides of the initial position are the same, the IFM of the layer and the FM of the layer are allocated to different sides of the initial position, thereby further reducing memory usage.

Further, when the allocated memory space on the left side of the initial position exceeds the left boundary, the left boundary is updated to the allocated memory space on the left side of the current initial position, and when the allocated memory space on the right side of the initial position exceeds the right boundary, the right boundary is updated to the allocated memory space on the right side of the current initial position, so that the sizes of the memory stacks on the left side and the right side are adjusted in a self-adaptive manner.

Drawings

FIG. 1 is a flow chart of a neural network-based inference memory allocation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of neural network inference in an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the general steps involved in an embodiment of the present invention.

Detailed Description

According to the analysis in the background art, in the prior art, the neural network inference occupies too large memory, and is difficult to implement on the edge computing device.

After research, the inventor finds that the neural network reasoning memory optimization scheme is mainly used for optimizing input and output data, does not take middle layer data into consideration, and is low in memory reuse utilization rate.

Specifically, in the prior art, input, output and intermediate layer data (blocks) required by the neural network for forward reasoning are abstracted into N blocks. Each block contains the amount of memory that needs to be occupied, and the nodes (layers) within the block. Each block needs to occupy a block of memory, and each block cannot interfere with each other, where the interference means that the intersection of nodes of two blocks is 0, so that the inference result can be guaranteed to be correct.

In the prior art, the memory of the previous block cannot be reused by the next block, and if the memory cannot be reused, a new memory is allocated. If block [ B1, B2, B3] needs to occupy memory sizes of 10MB, 1MB and 5MB in sequence. Both B2 and B3 can multiplex B1 memory, but B3 cannot multiplex B2 memory. Then the system will allocate 10MB for B1 first and B2 will reuse the memory of B1. Since B2 has already multiplexed B1 memory, B3 cannot multiplex B1 memory, only new 5MB of memory. Thus, the total memory is 15MB, which is a little bit more advanced than the 16MB total memory required for the tile mode.

The conditions of memory reuse in the above process are: memory reuse does not result in the correctness of the inference being affected. And judging whether the condition influencing the reasoning correctness is that whether the intersection of the blockBx and the Bn is 0 or not is judged. If the intersection is 0, the life cycle of the previous block is ended, and the data occupying the block cannot influence the subsequent reasoning result.

At present, in the aspect of neural network reasoning, the prior art (refer to CN 110597616A) discloses a memory allocation method and device for a neural network, which optimize input, output and middle layer data memories, and the scheme uses a memory management method based on sorting and memory multiplexing, and briefly includes the following 3 steps:

and a) sorting the memory occupied by the block from large to small. I.e. preferentially allocating the block occupying the largest memory. For example, the sorted memory blocks are [ B1, B2, B3, B4, B5], and it is assumed that their corresponding OFMs are also arranged from the front to the back.

Step B) allocating the memory block occupying the maximum memory, namely B1.

And c) sequentially distributing the sequenced blocks. For example, B2 may reuse the memory space of B1 if B2 and B1 do not intersect. If B3 and B2 intersect, then a new allocation of memory space to B3 is required. All blocks are arranged in sequence.

The scheme can improve the memory reuse rate, reduce the size of a newly opened memory and optimize the distribution of data in the middle layer.

The inventor has found that the scheme is not sufficient for memory reuse, for example, suppose B1 requires 10MB of memory, B2 requires 1MB, B3 requires 5M, and both B2 and B3 do not intersect with B1, and both B2 and B3 intersect. In this case, only B3 can reuse part of the memory of B1 after sorting from large to small using the above scheme. That is, B3 used 5MB of B1 memory and had 5MB empty, while B2 could not reuse the remaining 5G unused memory of B1, which requires a new allocation of memory space for B2.

In the process of allocating memory space for FM of each layer in neural network inference, the invention allocates memory from the last output layer of the neural network inference, allocates memory for each layer towards the direction of the input layer until memory allocation for all input layers is completed, and comprises the following steps: obtaining all final output layers in the neural network inference; allocating memory space for all final output layers in the neural network estimation at one side of the initial position; obtaining an IFM layer of a currently allocated memory space in the neural network estimation; allocating memory space for each IFM layer of the currently allocated memory space; in the process, after all IFM layers of a layer of a currently allocated memory space have been allocated with space, recovering the memory space of the layer of the currently allocated memory space, and when the memory space allocated to a certain IFM layer of a certain layer is located at a certain side of the initial position, so that the memory space of the layer of the currently allocated memory space can be recovered, allocating the IFM layer to the certain side of the initial position; and repeating the steps until all layers in the neural network inference are allocated with memory space, thereby reducing the memory occupation of the neural network inference, improving the memory reuse rate and ensuring that the memory planning is more reasonable.

In order that those skilled in the art will better understand and realize the present invention, the following detailed description is given by way of specific embodiments with reference to the accompanying drawings.

Example one

As described below, embodiments of the present invention provide a memory allocation method for neural network inference.

The memory allocation method provided in this embodiment may be used to allocate memory space to each layer in the neural network inference in advance before the neural network inference is run, or may be used to allocate memory space to each layer in the neural network inference running process.

The following is an example of a memory pre-allocation, but the present invention is not limited thereto.

The memory pre-allocation design provided by this embodiment adopts a reverse and divergent manner. Wherein:

reverse (reversed), namely, starting to allocate the memory from output (output layer), and allocating the memory required by other layers layer by layer forward until the allocation of the memory of input layer is completed;

divergence (middle out): the method is characterized in that a line is drawn by taking 0 as a boundary, memory is allocated to two sides, and the two sides have the leftmost boundary (min _ l _ boundary) and the rightmost boundary (max _ r _ boundary) to limit the size of memory divergence. Such as:

in the Sn +1 state, only the right side in the memory model occupies 4 memory sizes, the rightmost boundary is 4, and the leftmost boundary is-4, so that the max _ r _ boundary-max _ l _ boundary minimum principle should be followed to solve the state Sn at the next moment. Therefore, 2 units of memory needed to resolve the Sn time is allocated to the left memory as [ -2, -1 ].

Where subscript ranges from 0, i.e., in a manner consistent with computer software. For example, 0 represents a first block of 1MB memory space in 10MB memory, and 1 represents a second block of 1MB memory space.

Referring to a flow chart of a memory allocation method for neural network inference shown in fig. 1 and a flow chart of general steps shown in fig. 2, detailed description is given below by specific steps:

s101, initializing the memory management model.

In some embodiments, the initializing the memory management model may include: min _ l _ boundary = -1, max _ r _ boundary =0, lstack empty, rstack empty are set.

Wherein min _ l _ boundary represents a left boundary, min _ r _ boundary represents a right boundary, lstack represents a memory space within the left boundary of the initial position, and rstack represents a memory space within the right boundary of the initial position.

S102, all final output layers in the neural network inference are obtained.

Wherein the last output layer (custom term) refers to an OFM that is not an IFM of other layers, the last output layer is often at the end of neural network inference, most neural network inference has only one last output layer, and some neural network inference has multiple last output layers.

And S103, allocating memory space for all the last output layers in the neural network estimation at one side of the initial position.

In some embodiments, the allocating memory space for all last output layers (i.e. each last output layer) in the neural network concept at one side of the initial position may include:

Wherein, the ID refers to the identification number of each layer in the neural network.

S104, obtaining the IFM layer of the current allocated memory space in the neural network estimation.

S105, allocating memory space for each IFM layer of the currently allocated memory space.

Specifically, the spaces are sequentially allocated to each IFM layer of the layer to which the memory space is currently allocated according to the order of IDs from large to small.

In this process, after all IFM layers of a layer of a currently allocated memory space have been allocated with space, the memory space of the layer of the currently allocated memory space is recovered, and when the memory space allocated to a certain IFM layer of a certain layer is located on a certain side of the initial position, so that the memory space of the layer of the currently allocated memory space can be recovered, the IFM layer is allocated to the certain side of the initial position.

In some embodiments, the allocating memory space for each IFM tier of the tier to which memory space is currently allocated may include:

S106, judging whether all layers in the neural network estimation are allocated with memory space.

If yes, the memory allocation is finished.

If not, repeating the above steps (i.e., steps S104 to S105) of obtaining the IFM layer of the currently allocated memory space in the neural network estimation and allocating the memory space to each IFM layer of the currently allocated memory space until all layers in the neural network estimation have been allocated with the memory space.

In some embodiments, the determining whether all layers in the neural network inference have been allocated memory space may specifically include: and judging whether the memory space on the left side of the initial position and the memory space on the right side of the initial position are both recycled, if so, allocating the memory space to all layers in the neural network estimation.

In step S103, step S105 and step S106, when the allocated memory space on the left side of the initial position exceeds the left boundary, the left boundary may be updated to the allocated memory space on the left side of the current initial position, and similarly, when the allocated memory space on the right side of the initial position exceeds the right boundary, the right boundary may be updated to the allocated memory space on the right side of the current initial position.

In the memory allocation process, when FM is allocated to the right memory heap, the start memory address of FM should be set to rptr, always occupy FM. If the updated rptr is larger than the maximum right boundary, updating the maximum right boundary to be rptr;

similarly, when FM is allocated to the left bank, the end memory address of FM should be set to lptr, always occupy FM. If the updated lptr is smaller than the minimum left boundary, updating the minimum left boundary to be lptr;

when FM needs to be removed from rstack then rptr is also updated i.e. rptr = rptr-FM. Therefore, when the FM no longer needs to be reserved in the memory, the memory block occupied by the FM is recycled and used by other FMs;

similarly, when FM needs to be removed from lstack, then lptr is also updated, i.e., lptr = lptr + FM. So when an FM no longer needs to be reserved in memory, the memory block it occupies is reclaimed and used by other FMs.

The above description of the technical solution shows that: in this embodiment, when the allocated memory space on the left side of the initial position exceeds the left boundary, the left boundary is updated to the allocated memory space on the left side of the current initial position, and when the allocated memory space on the right side of the initial position exceeds the right boundary, the right boundary is updated to the allocated memory space on the right side of the current initial position, so that the sizes of the memory stacks on the left side and the right side are adaptively adjusted.

In some embodiments, when a layer is allocated to the left side and the right side of the initial position, the recoverable memory space is different in size, and the life cycle of the layer is 1, the layer is allocated to the side where the recoverable memory space is larger.

Wherein, the life cycle of a certain layer FM is 1, which means that the OFM data of the layer is only used by its next layer (and not used by other layers).

In some embodiments, further, in the allocating memory space for each IFM tier of the tier to which the memory space is currently allocated, when the expansion sizes of the left side boundary or the right side boundary caused by that a certain tier is allocated to the left and right sides of the initial position are different, the certain tier may be allocated to the side on which the expansion size of the left side boundary or the right side boundary is smaller (which may be named as a minimization allocation rule).

The allocation minimization principle may be premised on that reuse of other recoverable memories on the same side is not affected, that is, if a certain layer of FM is allocated on a certain side of the initial position, the life cycle of the layer must be shorter than the life cycles of the FM of all other layers that are currently allocated on the same side of the initial position, that is: the FM of this layer may release memory earlier than the FM of other layers on the same side.

Further, when it is not possible to determine which side of the initial position the FM of a certain layer should be allocated according to the minimum allocation principle (for example, when the extended sizes of the left side boundary or the right side boundary caused by allocating a certain layer to the left and the right sides of the initial position are the same, or because of some special network structure and other factors), the IFM of the layer and the FM of the layer are allocated to different sides of the initial position (i.e., considering the output-input relationship), that is, if the FM of a certain layer has been allocated to the right side of the initial position, the IFM layer of the layer is allocated to the left side of the initial position, and if the FM of a certain layer has been allocated to the left side of the initial position, the IFM layer of the layer is allocated to the right side of the initial position, thereby further reducing the memory usage.

The above description of the technical solution shows that: in this embodiment, in the process of allocating memory space for each IFM layer of the layer to which memory space is currently allocated, the layer is allocated to the side where the expansion size of the left side boundary or the right side boundary is smaller, and when the expansion sizes of the left side boundary or the right side boundary caused by allocating a layer to the left and right sides of the initial position are the same, the IFM of the layer and the FM of the layer are allocated to different sides of the initial position

In some embodiments, as an optional step, after all layers in the neural network inference have been allocated memory space, the amount of memory space that the neural network inference needs to occupy may also be determined according to the current left and right boundaries.

The above description of the technical solution shows that: in this embodiment, in the process of allocating memory space for the FMs of each layer in the neural network inference, allocating memory from the last output layer of the neural network inference, and allocating memory for each layer in the direction of the input layer until memory allocation for all input layers is completed includes: obtaining all final output layers in the neural network inference; allocating memory space for all final output layers in the neural network estimation at one side of the initial position; obtaining an IFM layer of a currently allocated memory space in the neural network estimation; allocating memory space for each IFM layer of the currently allocated memory space; in the process, after all IFM layers of a layer of a currently allocated memory space have been allocated with space, recovering the memory space of the layer of the currently allocated memory space, and when the memory space allocated to a certain IFM layer of a certain layer is located at a certain side of the initial position, so that the memory space of the layer of the currently allocated memory space can be recovered, allocating the IFM layer to the certain side of the initial position; and repeating the steps until all layers in the neural network inference are allocated with memory space, thereby reducing the memory occupation of the neural network inference, improving the memory reuse rate and ensuring that the memory planning is more reasonable.

Further, when the allocated memory space of an IFM layer of a certain layer is located on a certain side of the initial location, so that the memory space of the layer of the currently allocated memory space can be recycled, the IFM layer is allocated to the certain side of the initial location, so that in addition to optimizing the current state, the optimization of the future state is also considered.

How the scheme of the present embodiment allocates the memory is analyzed by a specific example as follows:

this example includes a linear network, an initiation structure, and a resnet structure. That is, common neural network structures are included, and it can be seen that the network is very representative.

As shown in fig. 3, bn denotes the nth FM, which may be understood as the blob of the nth layer, and bn = c denotes the memory size of the nth FM as c.

The scheme of the embodiment is adopted to allocate the memory for the neural network of the example, and the process is as follows:

1) initializing a memory management model;

2) find output, i.e., b8, and add in rstack;

3) finding IFMs of b8, i.e., b7 and b1, selects IFMs with large IDs, i.e., b7, and b7 to be placed on the left and right, which require as large a memory to be placed on the left. B8 will continue to remain on rstack since IFM b1 of b8 has not allocated memory;

4) finding the IFM with the largest ID in b7 and b8, i.e. b6, b6 on the left and right requires the same memory, but if b6 is on the right, then all IFMs on the left b7 have allocated memory, then b7 can release the memory and recycle it for use by other FM, so b6 is on the right;

5) finding the IFM with the largest ID in b7 and b6, namely b5 and b5, requires less memory to be placed on the left, so that b5 is placed on the left;

6) finding the IFM with the largest ID in b5 and b6, namely b4 and b4 are placed on the left and the right, the memory needs the same memory, since b4 is placed on the left, the memory can be recycled in b6, and therefore b4 is placed on the left, and since all IFMs of b6 are allocated with memory, the memory block in b6 can be recycled;

7) finding the IFM with the largest ID in b4 and b8, namely b3, b3 are allocated on the left and right to need the same memory, since b3 is placed on the right, all IFMs of b4 are allocated with memory, so that the b4 memory block can be recycled, and therefore b3 is allocated on the right;

8) finding the IFM with the largest ID in b5 and b3, namely b2, b2 are allocated on the left and right sides and need the same memory, and all IFMs of b5 and b3 complete memory allocation, so that both sides of b2 can be placed, and b2 is placed on the left side based on balance;

9) finding the IFM with the largest ID in b2 and b8, that is, b1 and b1 are allocated to the right, which requires less memory, and at the same time, all IFMs of b2 complete memory allocation, so that the memory of b2 can be recovered, and after b2 recovers, it is found that all IFMs of b5 also complete memory allocation, so that the memory of b5 can also be recovered;

10) finding the IFM with the largest b1 ID, namely b0 and b0 are allocated on the left side and the right side as the required memory, the memory of b1 can be recycled as all IFMs of b1 complete memory allocation, b0 is placed on the left side, and after the memory of b1 is recycled, all IFMs of b8 are allocated with memory, so the memory of b8 can also be recycled;

11) and finding out the IFM with the largest b1 ID, wherein the IFM does not exist, so that the b0 memory can be recycled, and after the b0 memory is recycled, the lstack and the rstack are both found to be empty, so that the memory allocation of all the FMs of the whole network is completed.

The total required memory in the process is counted as max _ r _ boundary-min _ l _ boundary-1 = 18.

Experiments and simulations show that the scheme of the embodiment has different memory optimization effects for different networks.

Preferably, for a network with an initiation _ resnet _ v2 structure, the scheme of the present embodiment can save about 96.6% of memory. Compared with the Sequential scheme, the size of the memory occupied by the Sequential scheme is equivalent to the case when the operating system does not perform memory management but performs memory allocation and management.

Worse, for the ypr architecture network, the solution of this embodiment can save about 66.77% of memory.

The average memory saving rate of the networks of different types is about 86 percent in accumulation, and the memory occupied by the neural network is greatly reduced.

Example two

As described below, an embodiment of the present invention provides a neural network-based memory allocation apparatus.

The memory allocation device for neural network inference comprises:

a processor adapted to load and execute instructions of a software program;

initializing the memory management model;

In some embodiments, said allocating memory space until all layers in the neural network inference have been allocated may comprise: and judging whether the memory space on the left side of the initial position and the memory space on the right side of the initial position are both recycled, if so, allocating the memory space to all layers in the neural network estimation.

In some embodiments, the left boundary may be updated to the allocated memory space to the left of the current initial location when the allocated memory space to the left of the initial location exceeds the left boundary, and the right boundary may be updated to the allocated memory space to the right of the current initial location when the allocated memory space to the right of the initial location exceeds the right boundary.

In some embodiments, in the allocating memory space for each IFM layer of the layer to which the memory space is currently allocated, when the expansion sizes of the left side boundary or the right side boundary caused by allocating a layer to the left and right sides of the initial position are different, the layer may be allocated to the side on which the expansion size of the left side boundary or the right side boundary is smaller.

In some embodiments, when the left boundary or the right boundary has the same expansion size due to the fact that a layer is allocated to the left and right sides of the initial position, if the FM of a layer has been allocated to the right side of the initial position, the IFM layer of the layer is allocated to the left side of the initial position, and if the FM of a layer has been allocated to the left side of the initial position, the IFM layer of the layer is allocated to the right side of the initial position.

In some embodiments, it may further include: after all layers in the neural network inference are allocated with memory space, determining the memory space size occupied by the neural network inference according to the current left side boundary and the current right side boundary.

Those skilled in the art will understand that, in the methods of the embodiments, all or part of the steps can be performed by hardware associated with program instructions, and the program can be stored in a computer-readable storage medium, which can include: ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A neural network reasoning memory allocation method is characterized in that in the process of allocating memory space for FM of each layer in neural network reasoning, memory is allocated from the last output layer of the neural network reasoning, and memory is allocated for each layer in the direction of an input layer until memory allocation for all input layers is completed, and the method comprises the following steps:

initializing the memory management model;

repeating the steps of obtaining the IFM layer of the currently allocated memory space in the neural network estimation and allocating the memory space for each IFM layer of the currently allocated memory space until all layers in the neural network estimation are allocated with the memory space.

2. The neural network-inferred memory allocation method of claim 1, wherein initializing the memory management model comprises: setting min _ l _ boundary = -1, max _ r _ boundary =0, lstack empty and rstack empty, wherein min _ l _ boundary represents a left boundary, min _ r _ boundary represents a right boundary, lstack represents a memory space within the left boundary of the initial position, and rstack represents a memory space within the right boundary of the initial position.

3. The method of neural network inference memory allocation according to claim 1, wherein said allocating memory space on one side of an initial position for all last output layers in the neural network inference comprises:

4. The neural network-based reasoning memory allocation method of claim 1, wherein said allocating memory space to each IFM tier of the tiers currently allocated memory space comprises:

5. The method of neural network inference memory allocation according to claim 1, wherein said allocating memory space until all layers in the neural network inference have been allocated comprises: and judging whether the memory space on the left side of the initial position and the memory space on the right side of the initial position are both recycled, if so, allocating the memory space to all layers in the neural network estimation.

6. The neural network-based reasoning memory allocation method of claim 1, wherein when a layer is allocated to a left side and a right side of an initial position, the recoverable memory space is different in size, and the life cycle of the layer is 1, the layer is allocated to a side where the recoverable memory space is larger.

7. The neural network-based reasoning memory allocation method of claim 1, wherein the left boundary is updated to the allocated memory space to the left of the current initial position when the allocated memory space to the left of the initial position exceeds the left boundary, and the right boundary is updated to the allocated memory space to the right of the current initial position when the allocated memory space to the right of the initial position exceeds the right boundary.

8. The neural network-based reasoning memory allocation method according to claim 7, wherein in the allocating memory space to each IFM tier of the tiers currently allocated with memory space, when an extent of a left side boundary or a right side boundary caused by a tier being allocated to left and right sides of an initial position is different, the tier is allocated to a side causing a smaller extent of the left side boundary or the right side boundary.

9. The neural network-based reasoning memory allocation method of claim 8, wherein when the left side boundary or the right side boundary has the same expansion size due to the fact that a certain layer is allocated to the left and right sides of the initial position, if the FM of the certain layer has been allocated to the right side of the initial position, the IFM layer of the certain layer is allocated to the left side of the initial position, and if the FM of the certain layer has been allocated to the left side of the initial position, the IFM layer of the certain layer is allocated to the right side of the initial position.

10. The neural network-inferred memory allocation method of claim 7, further comprising: after all layers in the neural network inference are allocated with memory space, determining the memory space size occupied by the neural network inference according to the current left side boundary and the current right side boundary.

11. A neural network-inferred memory allocation apparatus, comprising:

a processor adapted to load and execute instructions of a software program;

initializing the memory management model;

12. The neural network-inferred memory allocation arrangement of claim 11, wherein said allocating memory space for each IFM tier of the tiers of currently allocated memory space comprises:

13. The neural network-inferred memory allocation device of claim 11, wherein said allocating memory space until all layers in the neural network inference have been allocated comprises: and judging whether the memory space on the left side of the initial position and the memory space on the right side of the initial position are both recycled, if so, allocating the memory space to all layers in the neural network estimation.

14. The neural network-based inferential memory allocation apparatus of claim 11, wherein when a layer is allocated to the left and right sides of the initial position with different recoverable memory space sizes and the life cycle of the layer is 1, the layer is allocated to the side with the larger recoverable memory space.

15. The neural network-based inferential memory allocation device of claim 11, wherein the left boundary is updated to the allocated memory space to the left of the current initial position when the allocated memory space to the left of the initial position exceeds the left boundary, and the right boundary is updated to the allocated memory space to the right of the current initial position when the allocated memory space to the right of the initial position exceeds the right boundary.

16. The neural-network-inferred memory allocation apparatus of claim 15, wherein in allocating memory space for each IFM tier of a tier to which memory space is currently allocated, when an extent of a left side boundary or a right side boundary caused by a tier being allocated to the left and right sides of an initial position is different, the tier is allocated to a side causing a smaller extent of the left side boundary or the right side boundary.

17. The neural network-inferred memory allocation device of claim 16, wherein when the left or right boundaries of a layer are expanded by the same amount as the left or right boundaries of the layer are allocated to the left and right of the initial position, the IFM layer of the layer is allocated to the left of the initial position if the FM of the layer has been allocated to the right of the initial position, and the IFM layer of the layer is allocated to the right of the initial position if the FM of the layer has been allocated to the left of the initial position.

18. The neural network-inferred memory allocation device of claim 15, further comprising: after all layers in the neural network inference are allocated with memory space, determining the memory space size occupied by the neural network inference according to the current left side boundary and the current right side boundary.